Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are no...Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.展开更多
Precision medicine aims to empower clinicians to predict the most appropriate course of action for patients with complex diseases like cancer and others.1 With an efficient interrogation of the omics,molecular,and cli...Precision medicine aims to empower clinicians to predict the most appropriate course of action for patients with complex diseases like cancer and others.1 With an efficient interrogation of the omics,molecular,and clinical data at play in diseases,effective,personalized,and precise medical treatment strategies are expected for many disorders.In addition,the treatment modality of precision medicine is increasingly diversified,spanning from mainstream drug and antibody treatment to newly developed gene and cell therapy,etc.In various treatment modalities,including the treatment by Chinese medicine,precision is the highest demand,and the underlying patterns and rationales across various factors to uncover the biological mechanism and actionable information that support early detection,prevention,and therapy of complex disorders can be identified by the omics data.1,2 Therefore,the omics sequencing technology and accumulated omics data serve as the core technology and resources that can have a huge impact on such a course of action.Accordingly,the integrative analysis of such omics data provides a great opportunity to support precision medicine studies.However,facing the unique features of the omics data,like the dominant characteristics of“high dimension and small sample,”traditional data analysis strategies are likely unsuitable,and data-driven artificial intelligence(AI)technology is emerging as an effective paradigm for precision medicine study.To this end,this study aims to present a concise and overview perspective of the characteristics and the emerging directions of omics sequencing data,as well as the related AI analysis schema in handling such data,together with the summarized precision therapy cases based on the integration of AI and omics.展开更多
基金supported by Natural Science Foundation of Liaoning Province under Grant 2021-MS-272Educational Committee project of Liaoning Province under Grant LJKQZ2021088.
文摘Feature Selection(FS)is considered as an important preprocessing step in data mining and is used to remove redundant or unrelated features from high-dimensional data.Most optimization algorithms for FS problems are not balanced in search.A hybrid algorithm called nonlinear binary grasshopper whale optimization algorithm(NL-BGWOA)is proposed to solve the problem in this paper.In the proposed method,a new position updating strategy combining the position changes of whales and grasshoppers population is expressed,which optimizes the diversity of searching in the target domain.Ten distinct high-dimensional UCI datasets,the multi-modal Parkinson's speech datasets,and the COVID-19 symptom dataset are used to validate the proposed method.It has been demonstrated that the proposed NL-BGWOA performs well across most of high-dimensional datasets,which shows a high accuracy rate of up to 0.9895.Furthermore,the experimental results on the medical datasets also demonstrate the advantages of the proposed method in actual FS problem,including accuracy,size of feature subsets,and fitness with best values of 0.913,5.7,and 0.0873,respectively.The results reveal that the proposed NL-BGWOA has comprehensive superiority in solving the FS problem of high-dimensional data.
基金supported by the National Key Research and Development Program of China(2021YFF1201200 and 2021YFF1200900)National Natural Science Foundation of China(32341008 and 62088101)+1 种基金Shanghai Pilot Program for Basic Research,Shanghai Science and Technology Innovation Action Plan-Key Specialization in Computational Biology,Shanghai Shuguang Scholars Project,Shanghai Excellent Academic Leader Project,Shanghai Municipal Science and Technology Major Project(2021SHZDZX0100)Fundamental Research Funds for the Central Universities.
文摘Precision medicine aims to empower clinicians to predict the most appropriate course of action for patients with complex diseases like cancer and others.1 With an efficient interrogation of the omics,molecular,and clinical data at play in diseases,effective,personalized,and precise medical treatment strategies are expected for many disorders.In addition,the treatment modality of precision medicine is increasingly diversified,spanning from mainstream drug and antibody treatment to newly developed gene and cell therapy,etc.In various treatment modalities,including the treatment by Chinese medicine,precision is the highest demand,and the underlying patterns and rationales across various factors to uncover the biological mechanism and actionable information that support early detection,prevention,and therapy of complex disorders can be identified by the omics data.1,2 Therefore,the omics sequencing technology and accumulated omics data serve as the core technology and resources that can have a huge impact on such a course of action.Accordingly,the integrative analysis of such omics data provides a great opportunity to support precision medicine studies.However,facing the unique features of the omics data,like the dominant characteristics of“high dimension and small sample,”traditional data analysis strategies are likely unsuitable,and data-driven artificial intelligence(AI)technology is emerging as an effective paradigm for precision medicine study.To this end,this study aims to present a concise and overview perspective of the characteristics and the emerging directions of omics sequencing data,as well as the related AI analysis schema in handling such data,together with the summarized precision therapy cases based on the integration of AI and omics.