Thanks to its abundant reserves,relatively high energy density,and low reduction potential,potassium ion batteries(PIBs)have a high potential for large-scale energy storage applications.Due to the large radius of pota...Thanks to its abundant reserves,relatively high energy density,and low reduction potential,potassium ion batteries(PIBs)have a high potential for large-scale energy storage applications.Due to the large radius of potassium ions,most conventional anode materials undergo severe volume expansion,making it difficult to achieve stable and reversible energy storage.Therefore,developing high-performance anode materials is one of the critical factors in developing PIBs.In this sense,antimony(Sb)-based anode materials with high theoretical capacity and safe reaction potentials have a broad potential for application in PIBs.However,overcoming the rapid capacity decay induced by the large radius of potassium ions is still an issue that needs to be focused on.This paper reviews the latest research on different types of Sb-based anode materials and provides an in-depth analysis of their optimization strategies.We focus on material selection,structural design,and storage mechanisms to develop a detailed description of the material.In addition,the current challenges still faced by Sb-based anode materials are summarized,and some further optimization strategies have been added.We hope to provide some insights for researchers developing Sb-based anode materials for next-generation advanced PIBs.展开更多
Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanc...Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanced approach to FS for software fault prediction,specifically by enhancing the binary dwarf mongoose optimization(BDMO)algorithm with a crossover mechanism and a modified positioning updating formula.The proposed approach,termed iBDMOcr,aims to fortify exploration capability,promote population diversity,and lastly improve the wrapper-based FS process for software fault prediction tasks.iBDMOcr gained superb performance compared to other well-esteemed optimization methods across 17 benchmark datasets.It ranked first in 11 out of 17 datasets in terms of average classification accuracy.Moreover,iBDMOcr outperformed other methods in terms of average fitness values and number of selected features across all datasets.The findings demonstrate the effectiveness of iBDMOcr in addressing FS problems in software fault prediction,leading to more accurate and efficient models.展开更多
Manganese superoxide dismutase(MnSOD)is an antioxidant that exists in mitochondria and can effectively remove superoxide anions in mitochondria.In a dark,high-pressure,and low-temperature deep-sea environment,MnSOD is...Manganese superoxide dismutase(MnSOD)is an antioxidant that exists in mitochondria and can effectively remove superoxide anions in mitochondria.In a dark,high-pressure,and low-temperature deep-sea environment,MnSOD is essential for the survival of sea cucumbers.Six MnSODs were identified from the transcriptomes of deep and shallow-sea sea cucumbers.To explore their environmental adaptation mechanism,we conducted environmental selection pressure analysis through the branching site model of PAML software.We obtained night positive selection sites,and two of them were significant(97F→H,134K→V):97F→H located in a highly conservative characteristic sequence,and its polarity c hange might have a great impact on the function of MnSOD;134K→V had a change in piezophilic a bility,which might help MnSOD adapt to the environment of high hydrostatic pressure in the deepsea.To further study the effect of these two positive selection sites on MnSOD,we predicted the point mutations of F97H and K134V on shallow-sea sea cucumber by using MAESTROweb and PyMOL.Results show that 97F→H,134K→V might improve MnSOD’s efficiency of scavenging superoxide a nion and its ability to resist high hydrostatic pressure by moderately reducing its stability.The above results indicated that MnSODs of deep-sea sea cucumber adapted to deep-sea environments through their amino acid changes in polarity,piezophilic behavior,and local stability.This study revealed the correlation between MnSOD and extreme environment,and will help improve our understanding of the organism’s adaptation mechanisms in deep sea.展开更多
The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more e...The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more efficient and reliable intrusion detection systems(IDSs).However,the advent of larger IDS datasets has negatively impacted the performance and computational complexity of AI-based IDSs.Many researchers used data preprocessing techniques such as feature selection and normalization to overcome such issues.While most of these researchers reported the success of these preprocessing techniques on a shallow level,very few studies have been performed on their effects on a wider scale.Furthermore,the performance of an IDS model is subject to not only the utilized preprocessing techniques but also the dataset and the ML/DL algorithm used,which most of the existing studies give little emphasis on.Thus,this study provides an in-depth analysis of feature selection and normalization effects on IDS models built using three IDS datasets:NSL-KDD,UNSW-NB15,and CSE–CIC–IDS2018,and various AI algorithms.A wrapper-based approach,which tends to give superior performance,and min-max normalization methods were used for feature selection and normalization,respectively.Numerous IDS models were implemented using the full and feature-selected copies of the datasets with and without normalization.The models were evaluated using popular evaluation metrics in IDS modeling,intra-and inter-model comparisons were performed between models and with state-of-the-art works.Random forest(RF)models performed better on NSL-KDD and UNSW-NB15 datasets with accuracies of 99.86%and 96.01%,respectively,whereas artificial neural network(ANN)achieved the best accuracy of 95.43%on the CSE–CIC–IDS2018 dataset.The RF models also achieved an excellent performance compared to recent works.The results show that normalization and feature selection positively affect IDS modeling.Furthermore,while feature selection benefits simpler algorithms(such as RF),normalization is more useful for complex algorithms like ANNs and deep neural networks(DNNs),and algorithms such as Naive Bayes are unsuitable for IDS modeling.The study also found that the UNSW-NB15 and CSE–CIC–IDS2018 datasets are more complex and more suitable for building and evaluating modern-day IDS than the NSL-KDD dataset.Our findings suggest that prioritizing robust algorithms like RF,alongside complex models such as ANN and DNN,can significantly enhance IDS performance.These insights provide valuable guidance for managers to develop more effective security measures by focusing on high detection rates and low false alert rates.展开更多
Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these i...Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.展开更多
In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(...In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(CSI),which is difficult to achieve in practice.To be more practical,it is important to investigate covert communications against a warden with uncertain locations and imperfect CSI,which makes it difficult for legitimate transceivers to estimate the detection probability of the warden.First,the uncertainty caused by the unknown warden location must be removed,and the Optimal Detection Position(OPTDP)of the warden is derived which can provide the best detection performance(i.e.,the worst case for a covert communication).Then,to further avoid the impractical assumption of perfect CSI,the covert throughput is maximized using only the channel distribution information.Given this OPTDP based worst case for covert communications,the jammer selection,the jamming power,the transmission power,and the transmission rate are jointly optimized to maximize the covert throughput(OPTDP-JP).To solve this coupling problem,a Heuristic algorithm based on Maximum Distance Ratio(H-MAXDR)is proposed to provide a sub-optimal solution.First,according to the analysis of the covert throughput,the node with the maximum distance ratio(i.e.,the ratio of the distances from the jammer to the receiver and that to the warden)is selected as the friendly jammer(MAXDR).Then,the optimal transmission and jamming power can be derived,followed by the optimal transmission rate obtained via the bisection method.In numerical and simulation results,it is shown that although the location of the warden is unknown,by assuming the OPTDP of the warden,the proposed OPTDP-JP can always satisfy the covertness constraint.In addition,with an uncertain warden and imperfect CSI,the covert throughput provided by OPTDP-JP is 80%higher than the existing schemes when the covertness constraint is 0.9,showing the effectiveness of OPTDP-JP.展开更多
The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly re...The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly reflect the concentrations of chemical substances, and the measurement of meat traits by VIS/NIRS is similar to the processing of genomic selection data by summing all ‘polygenic effects' associated with spectral feature peaks. Therefore, it is meaningful to investigate the incorporation of VIS/NIRS information into GS models to establish an efficient and low-cost breeding model. In this study, we measured 6 meat quality traits in 359Duroc×Landrace×Yorkshire pigs from Guangxi Zhuang Autonomous Region, China, and genotyped them with high-density SNP chips. According to the completeness of the information for the target population, we proposed 4breeding strategies applied to different scenarios: Ⅰ, only spectral and genotypic data exist for the target population;Ⅱ, only spectral data exist for the target population;Ⅲ, only spectral and genotypic data but with different prediction processes exist for the target population;and Ⅳ, only spectral and phenotypic data exist for the target population.The 4 scenarios were used to evaluate the genomic estimated breeding value(GEBV) accuracy by increasing the VIS/NIR spectral information. In the results of the 5-fold cross-validation, the genetic algorithm showed remarkable potential for preselection of feature wavelengths. The breeding efficiency of Strategies Ⅱ, Ⅲ, and Ⅳ was superior to that of traditional GS for most traits, and the GEBV prediction accuracy was improved by 32.2, 40.8 and 15.5%, respectively on average. Among them, the prediction accuracy of Strategy Ⅱ for fat(%) even improved by 50.7% compared to traditional GS. The GEBV prediction accuracy of Strategy Ⅰ was nearly identical to that of traditional GS, and the fluctuation range was less than 7%. Moreover, the breeding cost of the 4 strategies was lower than that of traditional GS methods, with Strategy Ⅳ being the lowest as it did not require genotyping.Our findings demonstrate that GS methods based on VIS/NIRS data have significant predictive potential and are worthy of further research to provide a valuable reference for the development of effective and affordable breeding strategies.展开更多
This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independ...This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.展开更多
Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify sp...Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.展开更多
Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain c...Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain conditioning factor selection method rather than systematically study this uncertainty issue.Targeted,this study aims to systematically explore the influence rules of various commonly used conditioning factor selection methods on LSP,and on this basis to innovatively propose a principle with universal application for optimal selection of conditioning factors.An'yuan County in southern China is taken as example considering 431 landslides and 29 types of conditioning factors.Five commonly used factor selection methods,namely,the correlation analysis(CA),linear regression(LR),principal component analysis(PCA),rough set(RS)and artificial neural network(ANN),are applied to select the optimal factor combinations from the original 29 conditioning factors.The factor selection results are then used as inputs of four types of common machine learning models to construct 20 types of combined models,such as CA-multilayer perceptron,CA-random forest.Additionally,multifactor-based multilayer perceptron random forest models that selecting conditioning factors based on the proposed principle of“accurate data,rich types,clear significance,feasible operation and avoiding duplication”are constructed for comparisons.Finally,the LSP uncertainties are evaluated by the accuracy,susceptibility index distribution,etc.Results show that:(1)multifactor-based models have generally higher LSP performance and lower uncertainties than those of factors selection-based models;(2)Influence degree of different machine learning on LSP accuracy is greater than that of different factor selection methods.Conclusively,the above commonly used conditioning factor selection methods are not ideal for improving LSP performance and may complicate the LSP processes.In contrast,a satisfied combination of conditioning factors can be constructed according to the proposed principle.展开更多
Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors ha...Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites,they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware.It is impossible to stop attackers fromcreating phishingwebsites,which is one of the core challenges in combating them.However,this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handing over sensitive information.In this study,five machine learning(ML)and DL algorithms—cat-boost(CATB),gradient boost(GB),random forest(RF),multilayer perceptron(MLP),and deep neural network(DNN)—were tested with three different reputable datasets and two useful feature selection techniques,to assess the scalability and consistency of each classifier’s performance on varied dataset sizes.The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.9%,95.73%,and 98.83%.The GB classifier achieved the second-best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.16%,95.18%,and 98.58%.MLP achieved the best computational time across all datasets(DS-1,DS-2,and DS-3)with respective values of 2,7,and 3 seconds despite scoring the lowest accuracy across all datasets.展开更多
In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by re...In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.展开更多
Mitochondria play a key role in lipid metabolism,and mitochondrial DNA(mtDNA)mutations are thus considered to affect obesity susceptibility by altering oxidative phosphorylation and mitochondrial function.In this stud...Mitochondria play a key role in lipid metabolism,and mitochondrial DNA(mtDNA)mutations are thus considered to affect obesity susceptibility by altering oxidative phosphorylation and mitochondrial function.In this study,we investigate mtDNA variants that may affect obesity risk in 2877 Han Chinese individuals from 3 independent populations.The association analysis of 16 basal mtDNA haplogroups with body mass index,waist circumference,and waist-to-hip ratio reveals that only haplogroup M7 is significantly negatively correlated with all three adiposity-related anthropometric traits in the overall cohort,verified by the analysis of a single population,i.e.,the Zhengzhou population.Furthermore,subhaplogroup analysis suggests that M7b1a1 is the most likely haplogroup associated with a decreased obesity risk,and the variation T12811C(causing Y159H in ND5)harbored in M7b1a1 may be the most likely candidate for altering the mitochondrial function.Specifically,we find that proportionally more nonsynonymous mutations accumulate in M7b1a1 carriers,indicating that M7b1a1 is either under positive selection or subject to a relaxation of selective constraints.We also find that nuclear variants,especially in DACT2 and PIEZO1,may functionally interact with M7b1a1.展开更多
Compared with natural enzymes, nanozymes have the advantages of high stability and low cost;however,selectivity and sensitivity are key issues that prevent their further development. In this study, we report a cascade...Compared with natural enzymes, nanozymes have the advantages of high stability and low cost;however,selectivity and sensitivity are key issues that prevent their further development. In this study, we report a cascade nanozymatic system with significantly improved selectivity and sensitivity that combines more substrate-specific reactions and sensitive fiuorescence detection. Taking detection of ascorbic acid(AA)as an example, a cascade catalytic reaction system consisting of oxidase-like N-doped carbon nanocages(NC) and peroxidase-like copper oxide(Cu O) improved the reaction selectivity in transforming the substrate into the target product by more than 1200 times against the interference of uric acid. The cascade catalytic reaction system was also applicable for transfer from open reactors into a spatially confined microfiuidic device, increasing the slope of the calibration curves by approximately 1000-fold with a linear detection range of 2.5 nmol/L to 100 nmol/L and a low limit of detection of 0.77 nmol/L. This work offers a new strategy that achieves significant improvements in selectivity and sensitivity.展开更多
Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with mult...Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with multiple-input multipleoutput(MIMO)systems.Antenna selection plays a critical role in MIMO–NOMA systems as it has the potential to significantly reduce the cost and complexity associated with radio frequency chains.This paper considers antenna selection for downlink MIMO–NOMA networks with multiple-antenna basestation(BS)and multiple-antenna user equipments(UEs).An iterative antenna selection scheme is developed for a two-user system,and to determine the initial power required for this selection scheme,a power estimation method is also proposed.The proposed algorithm is then extended to a general multiuser NOMA system.Numerical results demonstrate that the proposed antenna selection algorithm achieves near-optimal performance with much lower computational complexity in both two-user and multiuser scenarios.展开更多
The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduce...The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems.展开更多
基金financially supported by the National Natural Science Foundation of China(No.22209057)the Guangzhou Basic and Applied Basic Research Foundation(No.2024A04J0839)。
文摘Thanks to its abundant reserves,relatively high energy density,and low reduction potential,potassium ion batteries(PIBs)have a high potential for large-scale energy storage applications.Due to the large radius of potassium ions,most conventional anode materials undergo severe volume expansion,making it difficult to achieve stable and reversible energy storage.Therefore,developing high-performance anode materials is one of the critical factors in developing PIBs.In this sense,antimony(Sb)-based anode materials with high theoretical capacity and safe reaction potentials have a broad potential for application in PIBs.However,overcoming the rapid capacity decay induced by the large radius of potassium ions is still an issue that needs to be focused on.This paper reviews the latest research on different types of Sb-based anode materials and provides an in-depth analysis of their optimization strategies.We focus on material selection,structural design,and storage mechanisms to develop a detailed description of the material.In addition,the current challenges still faced by Sb-based anode materials are summarized,and some further optimization strategies have been added.We hope to provide some insights for researchers developing Sb-based anode materials for next-generation advanced PIBs.
基金supported by the Deanship of Scientific Research and Innovation at Al-Balqa Applied University in Jordan.
文摘Feature selection(FS)plays a crucial role in pre-processing machine learning datasets,as it eliminates redundant features to improve classification accuracy and reduce computational costs.This paper presents an enhanced approach to FS for software fault prediction,specifically by enhancing the binary dwarf mongoose optimization(BDMO)algorithm with a crossover mechanism and a modified positioning updating formula.The proposed approach,termed iBDMOcr,aims to fortify exploration capability,promote population diversity,and lastly improve the wrapper-based FS process for software fault prediction tasks.iBDMOcr gained superb performance compared to other well-esteemed optimization methods across 17 benchmark datasets.It ranked first in 11 out of 17 datasets in terms of average classification accuracy.Moreover,iBDMOcr outperformed other methods in terms of average fitness values and number of selected features across all datasets.The findings demonstrate the effectiveness of iBDMOcr in addressing FS problems in software fault prediction,leading to more accurate and efficient models.
基金Supported by the Guangdong Province Basic and Applied Basic Research Fund Project(No.2020A1515110826)the National Natural Science Foundation of China(No.42006115)the Major Scientific and Technological Projects of Hainan Province(No.ZDKJ2021036)。
文摘Manganese superoxide dismutase(MnSOD)is an antioxidant that exists in mitochondria and can effectively remove superoxide anions in mitochondria.In a dark,high-pressure,and low-temperature deep-sea environment,MnSOD is essential for the survival of sea cucumbers.Six MnSODs were identified from the transcriptomes of deep and shallow-sea sea cucumbers.To explore their environmental adaptation mechanism,we conducted environmental selection pressure analysis through the branching site model of PAML software.We obtained night positive selection sites,and two of them were significant(97F→H,134K→V):97F→H located in a highly conservative characteristic sequence,and its polarity c hange might have a great impact on the function of MnSOD;134K→V had a change in piezophilic a bility,which might help MnSOD adapt to the environment of high hydrostatic pressure in the deepsea.To further study the effect of these two positive selection sites on MnSOD,we predicted the point mutations of F97H and K134V on shallow-sea sea cucumber by using MAESTROweb and PyMOL.Results show that 97F→H,134K→V might improve MnSOD’s efficiency of scavenging superoxide a nion and its ability to resist high hydrostatic pressure by moderately reducing its stability.The above results indicated that MnSODs of deep-sea sea cucumber adapted to deep-sea environments through their amino acid changes in polarity,piezophilic behavior,and local stability.This study revealed the correlation between MnSOD and extreme environment,and will help improve our understanding of the organism’s adaptation mechanisms in deep sea.
文摘The rapid rise of cyberattacks and the gradual failure of traditional defense systems and approaches led to using artificial intelligence(AI)techniques(such as machine learning(ML)and deep learning(DL))to build more efficient and reliable intrusion detection systems(IDSs).However,the advent of larger IDS datasets has negatively impacted the performance and computational complexity of AI-based IDSs.Many researchers used data preprocessing techniques such as feature selection and normalization to overcome such issues.While most of these researchers reported the success of these preprocessing techniques on a shallow level,very few studies have been performed on their effects on a wider scale.Furthermore,the performance of an IDS model is subject to not only the utilized preprocessing techniques but also the dataset and the ML/DL algorithm used,which most of the existing studies give little emphasis on.Thus,this study provides an in-depth analysis of feature selection and normalization effects on IDS models built using three IDS datasets:NSL-KDD,UNSW-NB15,and CSE–CIC–IDS2018,and various AI algorithms.A wrapper-based approach,which tends to give superior performance,and min-max normalization methods were used for feature selection and normalization,respectively.Numerous IDS models were implemented using the full and feature-selected copies of the datasets with and without normalization.The models were evaluated using popular evaluation metrics in IDS modeling,intra-and inter-model comparisons were performed between models and with state-of-the-art works.Random forest(RF)models performed better on NSL-KDD and UNSW-NB15 datasets with accuracies of 99.86%and 96.01%,respectively,whereas artificial neural network(ANN)achieved the best accuracy of 95.43%on the CSE–CIC–IDS2018 dataset.The RF models also achieved an excellent performance compared to recent works.The results show that normalization and feature selection positively affect IDS modeling.Furthermore,while feature selection benefits simpler algorithms(such as RF),normalization is more useful for complex algorithms like ANNs and deep neural networks(DNNs),and algorithms such as Naive Bayes are unsuitable for IDS modeling.The study also found that the UNSW-NB15 and CSE–CIC–IDS2018 datasets are more complex and more suitable for building and evaluating modern-day IDS than the NSL-KDD dataset.Our findings suggest that prioritizing robust algorithms like RF,alongside complex models such as ANN and DNN,can significantly enhance IDS performance.These insights provide valuable guidance for managers to develop more effective security measures by focusing on high detection rates and low false alert rates.
基金supported by the National Natural Science Foundation of China(42250101)the Macao Foundation。
文摘Earth’s internal core and crustal magnetic fields,as measured by geomagnetic satellites like MSS-1(Macao Science Satellite-1)and Swarm,are vital for understanding core dynamics and tectonic evolution.To model these internal magnetic fields accurately,data selection based on specific criteria is often employed to minimize the influence of rapidly changing current systems in the ionosphere and magnetosphere.However,the quantitative impact of various data selection criteria on internal geomagnetic field modeling is not well understood.This study aims to address this issue and provide a reference for constructing and applying geomagnetic field models.First,we collect the latest MSS-1 and Swarm satellite magnetic data and summarize widely used data selection criteria in geomagnetic field modeling.Second,we briefly describe the method to co-estimate the core,crustal,and large-scale magnetospheric fields using satellite magnetic data.Finally,we conduct a series of field modeling experiments with different data selection criteria to quantitatively estimate their influence.Our numerical experiments confirm that without selecting data from dark regions and geomagnetically quiet times,the resulting internal field differences at the Earth’s surface can range from tens to hundreds of nanotesla(nT).Additionally,we find that the uncertainties introduced into field models by different data selection criteria are significantly larger than the measurement accuracy of modern geomagnetic satellites.These uncertainties should be considered when utilizing constructed magnetic field models for scientific research and applications.
基金supported by the CAS Project for Young Scientists in Basic Research under Grant YSBR-035Jiangsu Provincial Key Research and Development Program under Grant BE2021013-2.
文摘In covert communications,joint jammer selection and power optimization are important to improve performance.However,existing schemes usually assume a warden with a known location and perfect Channel State Information(CSI),which is difficult to achieve in practice.To be more practical,it is important to investigate covert communications against a warden with uncertain locations and imperfect CSI,which makes it difficult for legitimate transceivers to estimate the detection probability of the warden.First,the uncertainty caused by the unknown warden location must be removed,and the Optimal Detection Position(OPTDP)of the warden is derived which can provide the best detection performance(i.e.,the worst case for a covert communication).Then,to further avoid the impractical assumption of perfect CSI,the covert throughput is maximized using only the channel distribution information.Given this OPTDP based worst case for covert communications,the jammer selection,the jamming power,the transmission power,and the transmission rate are jointly optimized to maximize the covert throughput(OPTDP-JP).To solve this coupling problem,a Heuristic algorithm based on Maximum Distance Ratio(H-MAXDR)is proposed to provide a sub-optimal solution.First,according to the analysis of the covert throughput,the node with the maximum distance ratio(i.e.,the ratio of the distances from the jammer to the receiver and that to the warden)is selected as the friendly jammer(MAXDR).Then,the optimal transmission and jamming power can be derived,followed by the optimal transmission rate obtained via the bisection method.In numerical and simulation results,it is shown that although the location of the warden is unknown,by assuming the OPTDP of the warden,the proposed OPTDP-JP can always satisfy the covertness constraint.In addition,with an uncertain warden and imperfect CSI,the covert throughput provided by OPTDP-JP is 80%higher than the existing schemes when the covertness constraint is 0.9,showing the effectiveness of OPTDP-JP.
基金supported by the National Natural Science Foundation of China(32160782 and 32060737).
文摘The principle of genomic selection(GS) entails estimating breeding values(BVs) by summing all the SNP polygenic effects. The visible/near-infrared spectroscopy(VIS/NIRS) wavelength and abundance values can directly reflect the concentrations of chemical substances, and the measurement of meat traits by VIS/NIRS is similar to the processing of genomic selection data by summing all ‘polygenic effects' associated with spectral feature peaks. Therefore, it is meaningful to investigate the incorporation of VIS/NIRS information into GS models to establish an efficient and low-cost breeding model. In this study, we measured 6 meat quality traits in 359Duroc×Landrace×Yorkshire pigs from Guangxi Zhuang Autonomous Region, China, and genotyped them with high-density SNP chips. According to the completeness of the information for the target population, we proposed 4breeding strategies applied to different scenarios: Ⅰ, only spectral and genotypic data exist for the target population;Ⅱ, only spectral data exist for the target population;Ⅲ, only spectral and genotypic data but with different prediction processes exist for the target population;and Ⅳ, only spectral and phenotypic data exist for the target population.The 4 scenarios were used to evaluate the genomic estimated breeding value(GEBV) accuracy by increasing the VIS/NIR spectral information. In the results of the 5-fold cross-validation, the genetic algorithm showed remarkable potential for preselection of feature wavelengths. The breeding efficiency of Strategies Ⅱ, Ⅲ, and Ⅳ was superior to that of traditional GS for most traits, and the GEBV prediction accuracy was improved by 32.2, 40.8 and 15.5%, respectively on average. Among them, the prediction accuracy of Strategy Ⅱ for fat(%) even improved by 50.7% compared to traditional GS. The GEBV prediction accuracy of Strategy Ⅰ was nearly identical to that of traditional GS, and the fluctuation range was less than 7%. Moreover, the breeding cost of the 4 strategies was lower than that of traditional GS methods, with Strategy Ⅳ being the lowest as it did not require genotyping.Our findings demonstrate that GS methods based on VIS/NIRS data have significant predictive potential and are worthy of further research to provide a valuable reference for the development of effective and affordable breeding strategies.
文摘This article constructs statistical selection procedures for exponential populations that may differ in only the threshold parameters. The scale parameters of the populations are assumed common and known. The independent samples drawn from the populations are taken to be of the same size. The best population is defined as the one associated with the largest threshold parameter. In case more than one population share the largest threshold, one of these is tagged at random and denoted the best. Two procedures are developed for choosing a subset of the populations having the property that the chosen subset contains the best population with a prescribed probability. One procedure is based on the sample minimum values drawn from the populations, and another is based on the sample means from the populations. An “Indifference Zone” (IZ) selection procedure is also developed based on the sample minimum values. The IZ procedure asserts that the population with the largest test statistic (e.g., the sample minimum) is the best population. With this approach, the sample size is chosen so as to guarantee that the probability of a correct selection is no less than a prescribed probability in the parameter region where the largest threshold is at least a prescribed amount larger than the remaining thresholds. Numerical examples are given, and the computer R-codes for all calculations are given in the Appendices.
基金the Deanship of Scientifc Research at King Khalid University for funding this work through large group Research Project under grant number RGP2/421/45supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2024/R/1446)+1 种基金supported by theResearchers Supporting Project Number(UM-DSR-IG-2023-07)Almaarefa University,Riyadh,Saudi Arabia.supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(No.2021R1F1A1055408).
文摘Machine learning(ML)is increasingly applied for medical image processing with appropriate learning paradigms.These applications include analyzing images of various organs,such as the brain,lung,eye,etc.,to identify specific flaws/diseases for diagnosis.The primary concern of ML applications is the precise selection of flexible image features for pattern detection and region classification.Most of the extracted image features are irrelevant and lead to an increase in computation time.Therefore,this article uses an analytical learning paradigm to design a Congruent Feature Selection Method to select the most relevant image features.This process trains the learning paradigm using similarity and correlation-based features over different textural intensities and pixel distributions.The similarity between the pixels over the various distribution patterns with high indexes is recommended for disease diagnosis.Later,the correlation based on intensity and distribution is analyzed to improve the feature selection congruency.Therefore,the more congruent pixels are sorted in the descending order of the selection,which identifies better regions than the distribution.Now,the learning paradigm is trained using intensity and region-based similarity to maximize the chances of selection.Therefore,the probability of feature selection,regardless of the textures and medical image patterns,is improved.This process enhances the performance of ML applications for different medical image processing.The proposed method improves the accuracy,precision,and training rate by 13.19%,10.69%,and 11.06%,respectively,compared to other models for the selected dataset.The mean error and selection time is also reduced by 12.56%and 13.56%,respectively,compared to the same models and dataset.
基金funded by the Natural Science Foundation of China(Grant Nos.42377164 and 41972280)the Badong National Observation and Research Station of Geohazards(Grant No.BNORSG-202305).
文摘Landslide susceptibility prediction(LSP)is significantly affected by the uncertainty issue of landslide related conditioning factor selection.However,most of literature only performs comparative studies on a certain conditioning factor selection method rather than systematically study this uncertainty issue.Targeted,this study aims to systematically explore the influence rules of various commonly used conditioning factor selection methods on LSP,and on this basis to innovatively propose a principle with universal application for optimal selection of conditioning factors.An'yuan County in southern China is taken as example considering 431 landslides and 29 types of conditioning factors.Five commonly used factor selection methods,namely,the correlation analysis(CA),linear regression(LR),principal component analysis(PCA),rough set(RS)and artificial neural network(ANN),are applied to select the optimal factor combinations from the original 29 conditioning factors.The factor selection results are then used as inputs of four types of common machine learning models to construct 20 types of combined models,such as CA-multilayer perceptron,CA-random forest.Additionally,multifactor-based multilayer perceptron random forest models that selecting conditioning factors based on the proposed principle of“accurate data,rich types,clear significance,feasible operation and avoiding duplication”are constructed for comparisons.Finally,the LSP uncertainties are evaluated by the accuracy,susceptibility index distribution,etc.Results show that:(1)multifactor-based models have generally higher LSP performance and lower uncertainties than those of factors selection-based models;(2)Influence degree of different machine learning on LSP accuracy is greater than that of different factor selection methods.Conclusively,the above commonly used conditioning factor selection methods are not ideal for improving LSP performance and may complicate the LSP processes.In contrast,a satisfied combination of conditioning factors can be constructed according to the proposed principle.
文摘Onemust interact with a specific webpage or website in order to use the Internet for communication,teamwork,and other productive activities.However,because phishing websites look benign and not all website visitors have the same knowledge and skills to inspect the trustworthiness of visited websites,they are tricked into disclosing sensitive information and making them vulnerable to malicious software attacks like ransomware.It is impossible to stop attackers fromcreating phishingwebsites,which is one of the core challenges in combating them.However,this threat can be alleviated by detecting a specific website as phishing and alerting online users to take the necessary precautions before handing over sensitive information.In this study,five machine learning(ML)and DL algorithms—cat-boost(CATB),gradient boost(GB),random forest(RF),multilayer perceptron(MLP),and deep neural network(DNN)—were tested with three different reputable datasets and two useful feature selection techniques,to assess the scalability and consistency of each classifier’s performance on varied dataset sizes.The experimental findings reveal that the CATB classifier achieved the best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.9%,95.73%,and 98.83%.The GB classifier achieved the second-best accuracy across all datasets(DS-1,DS-2,and DS-3)with respective values of 97.16%,95.18%,and 98.58%.MLP achieved the best computational time across all datasets(DS-1,DS-2,and DS-3)with respective values of 2,7,and 3 seconds despite scoring the lowest accuracy across all datasets.
文摘In this study,we examine the problem of sliced inverse regression(SIR),a widely used method for sufficient dimension reduction(SDR).It was designed to find reduced-dimensional versions of multivariate predictors by replacing them with a minimally adequate collection of their linear combinations without loss of information.Recently,regularization methods have been proposed in SIR to incorporate a sparse structure of predictors for better interpretability.However,existing methods consider convex relaxation to bypass the sparsity constraint,which may not lead to the best subset,and particularly tends to include irrelevant variables when predictors are correlated.In this study,we approach sparse SIR as a nonconvex optimization problem and directly tackle the sparsity constraint by establishing the optimal conditions and iteratively solving them by means of the splicing technique.Without employing convex relaxation on the sparsity constraint and the orthogonal constraint,our algorithm exhibits superior empirical merits,as evidenced by extensive numerical studies.Computationally,our algorithm is much faster than the relaxed approach for the natural sparse SIR estimator.Statistically,our algorithm surpasses existing methods in terms of accuracy for central subspace estimation and best subset selection and sustains high performance even with correlated predictors.
基金supported by the National Natural Science Foundation of China(32270670,32288101,32271186,and 32200482)the National Basic Research Program of China(2015FY111700)the CAMS Innovation Fund for Medical Sciences(2019-I2M-5-066).
文摘Mitochondria play a key role in lipid metabolism,and mitochondrial DNA(mtDNA)mutations are thus considered to affect obesity susceptibility by altering oxidative phosphorylation and mitochondrial function.In this study,we investigate mtDNA variants that may affect obesity risk in 2877 Han Chinese individuals from 3 independent populations.The association analysis of 16 basal mtDNA haplogroups with body mass index,waist circumference,and waist-to-hip ratio reveals that only haplogroup M7 is significantly negatively correlated with all three adiposity-related anthropometric traits in the overall cohort,verified by the analysis of a single population,i.e.,the Zhengzhou population.Furthermore,subhaplogroup analysis suggests that M7b1a1 is the most likely haplogroup associated with a decreased obesity risk,and the variation T12811C(causing Y159H in ND5)harbored in M7b1a1 may be the most likely candidate for altering the mitochondrial function.Specifically,we find that proportionally more nonsynonymous mutations accumulate in M7b1a1 carriers,indicating that M7b1a1 is either under positive selection or subject to a relaxation of selective constraints.We also find that nuclear variants,especially in DACT2 and PIEZO1,may functionally interact with M7b1a1.
基金supported by the National Natural Science Foundation of China (Nos. 22174014 and 22074015)。
文摘Compared with natural enzymes, nanozymes have the advantages of high stability and low cost;however,selectivity and sensitivity are key issues that prevent their further development. In this study, we report a cascade nanozymatic system with significantly improved selectivity and sensitivity that combines more substrate-specific reactions and sensitive fiuorescence detection. Taking detection of ascorbic acid(AA)as an example, a cascade catalytic reaction system consisting of oxidase-like N-doped carbon nanocages(NC) and peroxidase-like copper oxide(Cu O) improved the reaction selectivity in transforming the substrate into the target product by more than 1200 times against the interference of uric acid. The cascade catalytic reaction system was also applicable for transfer from open reactors into a spatially confined microfiuidic device, increasing the slope of the calibration curves by approximately 1000-fold with a linear detection range of 2.5 nmol/L to 100 nmol/L and a low limit of detection of 0.77 nmol/L. This work offers a new strategy that achieves significant improvements in selectivity and sensitivity.
文摘Non-orthogonal multiple access(NOMA)is a promising technology for the next generation wireless communication networks.The benefits of this technology can be further enhanced through deployment in conjunction with multiple-input multipleoutput(MIMO)systems.Antenna selection plays a critical role in MIMO–NOMA systems as it has the potential to significantly reduce the cost and complexity associated with radio frequency chains.This paper considers antenna selection for downlink MIMO–NOMA networks with multiple-antenna basestation(BS)and multiple-antenna user equipments(UEs).An iterative antenna selection scheme is developed for a two-user system,and to determine the initial power required for this selection scheme,a power estimation method is also proposed.The proposed algorithm is then extended to a general multiuser NOMA system.Numerical results demonstrate that the proposed antenna selection algorithm achieves near-optimal performance with much lower computational complexity in both two-user and multiuser scenarios.
基金supported by Ho Chi Minh City Open University,Vietnam under grant number E2024.02.1CD and Suan Sunandha Rajabhat University,Thailand.
文摘The Financial Technology(FinTech)sector has witnessed rapid growth,resulting in increasingly complex and high-volume digital transactions.Although this expansion improves efficiency and accessibility,it also introduces significant vulnerabilities,including fraud,money laundering,and market manipulation.Traditional anomaly detection techniques often fail to capture the relational and dynamic characteristics of financial data.Graph Neural Networks(GNNs),capable of modeling intricate interdependencies among entities,have emerged as a powerful framework for detecting subtle and sophisticated anomalies.However,the high-dimensionality and inherent noise of FinTech datasets demand robust feature selection strategies to improve model scalability,performance,and interpretability.This paper presents a comprehensive survey of GNN-based approaches for anomaly detection in FinTech,with an emphasis on the synergistic role of feature selection.We examine the theoretical foundations of GNNs,review state-of-the-art feature selection techniques,analyze their integration with GNNs,and categorize prevalent anomaly types in FinTech applications.In addition,we discuss practical implementation challenges,highlight representative case studies,and propose future research directions to advance the field of graph-based anomaly detection in financial systems.