Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomef...Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomefamous due to its high tendency towards convergence to the global optimummost of the time. But, still the standard bat with random walk has a problemof getting stuck in local minima. In order to solve this problem, this researchproposed bat algorithm with levy flight random walk. Then, the proposedBat with Levy flight algorithm is further hybridized with three differentvariants of ANN. The proposed BatLFBP is applied to the problem ofinsulin DNA sequence classification of healthy homosapien. For classificationperformance, the proposed models such as Bat levy flight Artificial NeuralNetwork (BatLFANN) and Bat levy Flight Back Propagation (BatLFBP) arecompared with the other state-of-the-art algorithms like Bat Artificial NeuralNetwork (BatANN), Bat back propagation (BatBP), Bat Gaussian distribution Artificial Neural Network (BatGDANN). And Bat Gaussian distributionback propagation (BatGDBP), in-terms of means squared error (MSE) andaccuracy. From the perspective of simulations results, it is show that theproposed BatLFANN achieved 99.88153% accuracy with MSE of 0.001185,and BatLFBP achieved 99.834185 accuracy with MSE of 0.001658 on WL5.While on WL10 the proposed BatLFANN achieved 99.89899% accuracy withMSE of 0.00101, and BatLFBP achieved 99.84473% accuracy with MSE of0.004553. Similarly, on WL15 the proposed BatLFANN achieved 99.82853%accuracy with MSE of 0.001715, and BatLFBP achieved 99.3262% accuracywith MSE of 0.006738 which achieve better accuracy as compared to the otherhybrid models.展开更多
In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we pr...In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model;therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.展开更多
Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where la...Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where large biomedical datasets are transformed into valuable knowledge.Existing methods rely on a feature extraction step and suffer from high computational time requirements.In contrast,newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency.In this paper,we investigate the performance of various deep learning architectures:Convolutional Neural Network(CNN),CNN-Long Short-Term Memory(CNNLSTM),CNN-Bidirectional Long Short-Term Memory(CNN-BiLSTM),Residual Network(ResNet),and InceptionV3 for DNA sequence classification.Various numerical and visual data representation techniques are utilized to represent the input datasets,including:label encoding,k-mer sentence encoding,k-mer one-hot vector,Frequency Chaos Game Representation(FCGR)and 5-Color Map(ColorSquare).Three datasets are used for the training of the models including H3,H4 and DNA Sequence Dataset(Yeast,Human,Arabidopsis Thaliana).Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task.Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded k-mer sequences yields the best performance,achieving an accuracy of 92.1%.展开更多
The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in ...The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in groups by their energetic properties enables evaluation of the dependence of promoter strength on the energetic properties. The analysis of groups (clusters) of promoters distributed by the energy of DNA strands interaction in ?55, ?35, ?10 and +6 sequences indicates their connection with the transcriptional activity.展开更多
基金This research is supported by Tier-1 Research Grant, vote no. H938 by ResearchManagement Office (RMC), Universiti Tun Hussein Onn Malaysia and Ministry of Higher Education,Malaysia.
文摘Recently, many researchers have used nature inspired metaheuristicalgorithms due to their ability to perform optimally on complex problems. Tosolve problems in a simple way, in the recent era bat algorithm has becomefamous due to its high tendency towards convergence to the global optimummost of the time. But, still the standard bat with random walk has a problemof getting stuck in local minima. In order to solve this problem, this researchproposed bat algorithm with levy flight random walk. Then, the proposedBat with Levy flight algorithm is further hybridized with three differentvariants of ANN. The proposed BatLFBP is applied to the problem ofinsulin DNA sequence classification of healthy homosapien. For classificationperformance, the proposed models such as Bat levy flight Artificial NeuralNetwork (BatLFANN) and Bat levy Flight Back Propagation (BatLFBP) arecompared with the other state-of-the-art algorithms like Bat Artificial NeuralNetwork (BatANN), Bat back propagation (BatBP), Bat Gaussian distribution Artificial Neural Network (BatGDANN). And Bat Gaussian distributionback propagation (BatGDBP), in-terms of means squared error (MSE) andaccuracy. From the perspective of simulations results, it is show that theproposed BatLFANN achieved 99.88153% accuracy with MSE of 0.001185,and BatLFBP achieved 99.834185 accuracy with MSE of 0.001658 on WL5.While on WL10 the proposed BatLFANN achieved 99.89899% accuracy withMSE of 0.00101, and BatLFBP achieved 99.84473% accuracy with MSE of0.004553. Similarly, on WL15 the proposed BatLFANN achieved 99.82853%accuracy with MSE of 0.001715, and BatLFBP achieved 99.3262% accuracywith MSE of 0.006738 which achieve better accuracy as compared to the otherhybrid models.
文摘In recent years, a deep learning model called convolutional neural network with an ability of extracting features of high-level abstraction from minimum preprocessing data has been widely used. In this research, we proposed a new approach in classifying DNA sequences using the convolutional neural network while considering these sequences as text data. We used one-hot vectors to represent sequences as input to the model;therefore, it conserves the essential position information of each nucleotide in sequences. Using 12 DNA sequence datasets, we evaluated our proposed model and achieved significant improvements in all of these datasets. This result has shown a potential of using convolutional neural network for DNA sequence to solve other sequence problems in bioinformatics.
基金funded by the Researchers Supporting Project number(RSPD2025R857),King Saud University,Riyadh,Saudi Arabia.
文摘Many bioinformatics applications require determining the class of a newly sequenced Deoxyribonucleic acid(DNA)sequence,making DNA sequence classification an integral step in performing bioinformatics analysis,where large biomedical datasets are transformed into valuable knowledge.Existing methods rely on a feature extraction step and suffer from high computational time requirements.In contrast,newer approaches leveraging deep learning have shown significant promise in enhancing accuracy and efficiency.In this paper,we investigate the performance of various deep learning architectures:Convolutional Neural Network(CNN),CNN-Long Short-Term Memory(CNNLSTM),CNN-Bidirectional Long Short-Term Memory(CNN-BiLSTM),Residual Network(ResNet),and InceptionV3 for DNA sequence classification.Various numerical and visual data representation techniques are utilized to represent the input datasets,including:label encoding,k-mer sentence encoding,k-mer one-hot vector,Frequency Chaos Game Representation(FCGR)and 5-Color Map(ColorSquare).Three datasets are used for the training of the models including H3,H4 and DNA Sequence Dataset(Yeast,Human,Arabidopsis Thaliana).Experiments are performed to determine which combination of DNA representation and deep learning architecture yields improved performance for the classification task.Our results indicate that using a hybrid CNN-LSTM neural network trained on DNA sequences represented as one-hot encoded k-mer sequences yields the best performance,achieving an accuracy of 92.1%.
文摘The energy of interaction between DNA strands in promoters is of great functional importance. Visualization of the energy of DNA strands distribution in promoter sequences was achieved. The separation of promoters in groups by their energetic properties enables evaluation of the dependence of promoter strength on the energetic properties. The analysis of groups (clusters) of promoters distributed by the energy of DNA strands interaction in ?55, ?35, ?10 and +6 sequences indicates their connection with the transcriptional activity.