There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug disc...There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug discovery,rely on such data;nevertheless,gathering and extracting data from these resources is a tough undertaking.This data should go through several processes,including mining,data processing,analysis,and classification.This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention.This software simulates the extraction of data from web-based(point-and-click)resources or graphical user interfaces that cannot be accessed using command-line tools.The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides(AMP)sequences(46240 hits)from various MARVIN software panels,which can be later utilized to develop novel AMPs.Furthermore,for machine learning research,the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank.As a result,data collection from the web will become faster and less expensive,with no need for manual data extraction.The software is critical as a first step to preparing large datasets for subsequent stages of analysis,such as those using machine and deep-learning applications.展开更多
Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treat...Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.展开更多
Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein charac...Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.展开更多
A relationship between lung transplant success and many features of recipients’/donors has long been studied.However,modeling a robust model of a potential impact on organ transplant success has proved challenging.In...A relationship between lung transplant success and many features of recipients’/donors has long been studied.However,modeling a robust model of a potential impact on organ transplant success has proved challenging.In this study,a hybrid feature selection model was developed based on ant colony opti-mization(ACO)and k-nearest neighbor(kNN)classifier to investigate the rela-tionship between the most defining features of recipients/donors and lung transplant success using data from the United Network of Organ Sharing(UNOS).The proposed ACO-kNN approach explores the features space to identify the representative attributes and classify patients’functional status(i.e.,quality of life)after lung transplantation.The efficacy of the proposed model was verified using 3,684 records and 118 input features from the UNOS.The developed approach examined the reliability and validity of the lung allocation process.The results are promising regarding accuracy prediction to be 91.3%and low computational time,along with better decision capabilities,emphasizing the potential for automatic classification of the lung and other organs allocation pro-cesses.In addition,the proposed model recommends a new perspective on how medical experts and clinicians respond to uncertain and challenging lung alloca-tion strategies.Having such ACO-kNN model,a medical professional can sum-marize information through the proposed method and make decisions for the upcoming transplants to allocate the donor organ.展开更多
The main purpose of this study is to develop a mathematical model for calculating the probability of money laundering process, by monitoring the behavior of the client using 70 indicators of money laundering. The scie...The main purpose of this study is to develop a mathematical model for calculating the probability of money laundering process, by monitoring the behavior of the client using 70 indicators of money laundering. The scientific method used in this study (received from the Modern Criminology) has great investigative power and it is widely applicable. Hopefully the practical application of this study will increase greatly the probability of detection and punishment of the clients who are implicated in the process of money laundering. In particular, this study will be useful for banks, Financial Intelligence Unit (FIU) of Albania, Department of Economic Crime at the Ministry of Domestic Affairs and Albanian State Intelligence Service (SIS). Also, the investigation of money laundering will be a useful tool to detect other crimes, such as drug trafficking, human trafficking, illegal arms trade, etc. The prevention of money laundering is simultaneously a powerful strike against terrorism both on national and international levels.展开更多
A groundbreaking method is introduced to leverage machine learn-ing algorithms to revolutionize the prediction of success rates for science fiction films.In the captivating world of the film industry,extensive researc...A groundbreaking method is introduced to leverage machine learn-ing algorithms to revolutionize the prediction of success rates for science fiction films.In the captivating world of the film industry,extensive research and accurate forecasting are vital to anticipating a movie’s triumph prior to its debut.Our study aims to harness the power of available data to estimate a film’s early success rate.With the vast resources offered by the internet,we can access a plethora of movie-related information,including actors,directors,critic reviews,user reviews,ratings,writers,budgets,genres,Facebook likes,YouTube views for movie trailers,and Twitter followers.The first few weeks of a film’s release are crucial in determining its fate,and online reviews and film evaluations profoundly impact its opening-week earnings.Hence,our research employs advanced supervised machine learning techniques to predict a film’s triumph.The Internet Movie Database(IMDb)is a comprehensive data repository for nearly all movies.A robust predictive classification approach is developed by employing various machine learning algorithms,such as fine,medium,coarse,cosine,cubic,and weighted KNN.To determine the best model,the performance of each feature was evaluated based on composite metrics.Moreover,the significant influences of social media platforms were recognized including Twitter,Instagram,and Facebook on shaping individuals’opinions.A hybrid success rating prediction model is obtained by integrating the proposed prediction models with sentiment analysis from available platforms.The findings of this study demonstrate that the chosen algorithms offer more precise estimations,faster execution times,and higher accuracy rates when compared to previous research.By integrating the features of existing prediction models and social media sentiment analysis models,our proposed approach provides a remarkably accurate prediction of a movie’s success.This breakthrough can help movie producers and marketers anticipate a film’s triumph before its release,allowing them to tailor their promotional activities accordingly.Furthermore,the adopted research lays the foundation for developing even more accurate prediction models,considering the ever-increasing significance of social media platforms in shaping individ-uals’opinions.In conclusion,this study showcases the immense potential of machine learning algorithms in predicting the success rate of science fiction films,opening new avenues for the film industry.展开更多
A parametric study for a series of technological and geometrical parameters affecting rise time of Al/aSiC/c-Si(p)/c-Si(n~+)/Al thyristor-like switches,is presented here for the first time,using two-dimensional s...A parametric study for a series of technological and geometrical parameters affecting rise time of Al/aSiC/c-Si(p)/c-Si(n~+)/Al thyristor-like switches,is presented here for the first time,using two-dimensional simulation techniques.By varying anode current values in simulation procedure we achieved very good agreement between simulation and experimental results for the rising time characteristics of the switch.A series of factors affecting the rising time of the switches are studied here.Two factors among all others studied here,exerting most significant influence,of more than one order of magnitude on the rising time,are a-SiC and c-Si(p) region widths,validating our earlier presented model for device operation.The above widths can be easily varied on device manufacture procedure.We also successfully simulated the rising time characteristics of our earlier presented simulated improved switch,with forward breakover voltage V(BF) = 11 V and forward voltage drop VF = 9.5 V at the ON state,exhibiting an ultra low rise time value of less than 10 ps,which in conjunction with its high anode current density values of 12 A/mm^2 and also cheap and easy fabrication techniques,makes this switch appropriate for ESD protection as well as RF MEMS and NEMS applications.展开更多
ScalaLab is a MATLAB-like environment for the Java Virtual Machine(JVM).ScalaLab is based on the Scala programming language.It utilizes an extensive set of Java and Scala scientific libraries and also has access to ma...ScalaLab is a MATLAB-like environment for the Java Virtual Machine(JVM).ScalaLab is based on the Scala programming language.It utilizes an extensive set of Java and Scala scientific libraries and also has access to many native C/C++scientific libraries by using mainly the Java Native Interface(JNI).The performance of the JVM platform is continuously improved at a fast pace.Today JVM can effectively support demanding high-performance computing and scales well on multicore platforms.However,sometimes optimized native C/C++code can yield even better performance,by exploiting low-level programming issues,such as optimization of caches and architecture-dependent instruction sets.The present work reports some of the experiences that we gained with experiments with both Just in Time(JIT)JVM code and native code.We compare some aspects of Scala and C++that concern the requirements of scientific computing and highlight some strong features of the Scala language that facilitate the implementation of scientific scripting.This paper describes how ScalaLab tries to combine the best features of the JVM with those of the C/C++technology,in order to implement an effective scientific computing environment.展开更多
基金This work was funded by the Graduate Scientific Research School at Yarmouk University under Grant Number:82/2020。
文摘There are quintillions of data on deoxyribonucleic acid(DNA)and protein in publicly accessible data banks,and that number is expanding at an exponential rate.Many scientific fields,such as bioinformatics and drug discovery,rely on such data;nevertheless,gathering and extracting data from these resources is a tough undertaking.This data should go through several processes,including mining,data processing,analysis,and classification.This study proposes software that extracts data from big data repositories automatically and with the particular ability to repeat data extraction phases as many times as needed without human intervention.This software simulates the extraction of data from web-based(point-and-click)resources or graphical user interfaces that cannot be accessed using command-line tools.The software was evaluated by creating a novel database of 34 parameters for 1360 physicochemical properties of antimicrobial peptides(AMP)sequences(46240 hits)from various MARVIN software panels,which can be later utilized to develop novel AMPs.Furthermore,for machine learning research,the program was validated by extracting 10,000 protein tertiary structures from the Protein Data Bank.As a result,data collection from the web will become faster and less expensive,with no need for manual data extraction.The software is critical as a first step to preparing large datasets for subsequent stages of analysis,such as those using machine and deep-learning applications.
文摘Complex proteins are needed for many biological activities.Folding amino acid chains reveals their properties and functions.They support healthy tissue structure,physiology,and homeostasis.Precision medicine and treatments require quantitative protein identification and function.Despite technical advances and protein sequence data exploration,bioinformatics’“basic structure”problem—the automatic deduction of a protein’s properties from its amino acid sequence—remains unsolved.Protein function inference from amino acid sequences is the main biological data challenge.This study analyzes whether raw sequencing can characterize biological facts.A massive corpus of protein sequences and the Globin-like superfamily’s related protein families generate a solid vector representation.A coding technique for each sequence in each family was devised using two representations to identify each amino acid precisely.A bispectral analysis converts encoded protein numerical sequences into images for better protein sequence and family discrimination.Training and validation employed 70%of the dataset,while 30%was used for testing.This paper examined the performance of multistage deep learning models for differentiating between sixteen protein families after encoding and representing each encoded sequence by a higher spectral representation image(Bispectrum).Cascading minimized false positive and negative cases in all phases.The initial stage focused on two classes(six groups and ten groups).The subsequent stages focused on the few classes almost accurately separated in the first stage and decreased the overlapping cases between families that appeared in single-stage deep learning classification.The single-stage technique had 64.2%+/-22.8%accuracy,63.3%+/-17.1%precision,and a 63.2%+/19.4%F1-score.The two-stage technique yielded 92.2%+/-4.9%accuracy,92.7%+/-7.0%precision,and a 92.3%+/-5.0%F1-score.This work provides balanced,reliable,and precise forecasts for all families in all measures.It ensured that the new model was resilient to family variances and provided high-scoring results.
文摘Proteins are essential for many biological functions.For example,folding amino acid chains reveals their functionalities by maintaining tissue structure,physiology,and homeostasis.Note that quantifiable protein characteristics are vital for improving therapies and precision medicine.The automatic inference of a protein’s properties from its amino acid sequence is called“basic structure”.Nevertheless,it remains a critical unsolved challenge in bioinformatics,although with recent technological advances and the investigation of protein sequence data.Inferring protein function from amino acid sequences is crucial in biology.This study considers using raw sequencing to explain biological facts using a large corpus of protein sequences and the Globin-like superfamily to generate a vector representation.The power of two representations was used to identify each amino acid,and a coding technique was established for each sequence family.Subsequently,the encoded protein numerical sequences are transformed into an image using bispectral analysis to identify essential characteristics for discriminating between protein sequences and their families.A deep Convolutional Neural Network(CNN)classifies the resulting images and developed non-normalized and normalized encoding techniques.Initially,the dataset was split 70/30 for training and testing.Correspondingly,the dataset was utilized for 70%training,15%validation,and 15%testing.The suggested methods are evaluated using accuracy,precision,and recall.The non-normalized method had 70%accuracy,72%precision,and 71%recall.68%accuracy,67%precision,and 67%recall after validation.Meanwhile,the normalized approach without validation had 92.4%accuracy,94.3%precision,and 91.1%recall.Validation showed 90%accuracy,91.2%precision,and 89.7%recall.Note that both algorithms outperform the rest.The paper presents that bispectrum-based nonlinear analysis using deep learning models outperforms standard machine learning methods and other deep learning methods based on convolutional architecture.They offered the best inference performance as the proposed approach improves categorization and prediction.Several instances show successful multi-class prediction in molecular biology’s massive data.
文摘A relationship between lung transplant success and many features of recipients’/donors has long been studied.However,modeling a robust model of a potential impact on organ transplant success has proved challenging.In this study,a hybrid feature selection model was developed based on ant colony opti-mization(ACO)and k-nearest neighbor(kNN)classifier to investigate the rela-tionship between the most defining features of recipients/donors and lung transplant success using data from the United Network of Organ Sharing(UNOS).The proposed ACO-kNN approach explores the features space to identify the representative attributes and classify patients’functional status(i.e.,quality of life)after lung transplantation.The efficacy of the proposed model was verified using 3,684 records and 118 input features from the UNOS.The developed approach examined the reliability and validity of the lung allocation process.The results are promising regarding accuracy prediction to be 91.3%and low computational time,along with better decision capabilities,emphasizing the potential for automatic classification of the lung and other organs allocation pro-cesses.In addition,the proposed model recommends a new perspective on how medical experts and clinicians respond to uncertain and challenging lung alloca-tion strategies.Having such ACO-kNN model,a medical professional can sum-marize information through the proposed method and make decisions for the upcoming transplants to allocate the donor organ.
文摘The main purpose of this study is to develop a mathematical model for calculating the probability of money laundering process, by monitoring the behavior of the client using 70 indicators of money laundering. The scientific method used in this study (received from the Modern Criminology) has great investigative power and it is widely applicable. Hopefully the practical application of this study will increase greatly the probability of detection and punishment of the clients who are implicated in the process of money laundering. In particular, this study will be useful for banks, Financial Intelligence Unit (FIU) of Albania, Department of Economic Crime at the Ministry of Domestic Affairs and Albanian State Intelligence Service (SIS). Also, the investigation of money laundering will be a useful tool to detect other crimes, such as drug trafficking, human trafficking, illegal arms trade, etc. The prevention of money laundering is simultaneously a powerful strike against terrorism both on national and international levels.
文摘A groundbreaking method is introduced to leverage machine learn-ing algorithms to revolutionize the prediction of success rates for science fiction films.In the captivating world of the film industry,extensive research and accurate forecasting are vital to anticipating a movie’s triumph prior to its debut.Our study aims to harness the power of available data to estimate a film’s early success rate.With the vast resources offered by the internet,we can access a plethora of movie-related information,including actors,directors,critic reviews,user reviews,ratings,writers,budgets,genres,Facebook likes,YouTube views for movie trailers,and Twitter followers.The first few weeks of a film’s release are crucial in determining its fate,and online reviews and film evaluations profoundly impact its opening-week earnings.Hence,our research employs advanced supervised machine learning techniques to predict a film’s triumph.The Internet Movie Database(IMDb)is a comprehensive data repository for nearly all movies.A robust predictive classification approach is developed by employing various machine learning algorithms,such as fine,medium,coarse,cosine,cubic,and weighted KNN.To determine the best model,the performance of each feature was evaluated based on composite metrics.Moreover,the significant influences of social media platforms were recognized including Twitter,Instagram,and Facebook on shaping individuals’opinions.A hybrid success rating prediction model is obtained by integrating the proposed prediction models with sentiment analysis from available platforms.The findings of this study demonstrate that the chosen algorithms offer more precise estimations,faster execution times,and higher accuracy rates when compared to previous research.By integrating the features of existing prediction models and social media sentiment analysis models,our proposed approach provides a remarkably accurate prediction of a movie’s success.This breakthrough can help movie producers and marketers anticipate a film’s triumph before its release,allowing them to tailor their promotional activities accordingly.Furthermore,the adopted research lays the foundation for developing even more accurate prediction models,considering the ever-increasing significance of social media platforms in shaping individ-uals’opinions.In conclusion,this study showcases the immense potential of machine learning algorithms in predicting the success rate of science fiction films,opening new avenues for the film industry.
文摘A parametric study for a series of technological and geometrical parameters affecting rise time of Al/aSiC/c-Si(p)/c-Si(n~+)/Al thyristor-like switches,is presented here for the first time,using two-dimensional simulation techniques.By varying anode current values in simulation procedure we achieved very good agreement between simulation and experimental results for the rising time characteristics of the switch.A series of factors affecting the rising time of the switches are studied here.Two factors among all others studied here,exerting most significant influence,of more than one order of magnitude on the rising time,are a-SiC and c-Si(p) region widths,validating our earlier presented model for device operation.The above widths can be easily varied on device manufacture procedure.We also successfully simulated the rising time characteristics of our earlier presented simulated improved switch,with forward breakover voltage V(BF) = 11 V and forward voltage drop VF = 9.5 V at the ON state,exhibiting an ultra low rise time value of less than 10 ps,which in conjunction with its high anode current density values of 12 A/mm^2 and also cheap and easy fabrication techniques,makes this switch appropriate for ESD protection as well as RF MEMS and NEMS applications.
文摘ScalaLab is a MATLAB-like environment for the Java Virtual Machine(JVM).ScalaLab is based on the Scala programming language.It utilizes an extensive set of Java and Scala scientific libraries and also has access to many native C/C++scientific libraries by using mainly the Java Native Interface(JNI).The performance of the JVM platform is continuously improved at a fast pace.Today JVM can effectively support demanding high-performance computing and scales well on multicore platforms.However,sometimes optimized native C/C++code can yield even better performance,by exploiting low-level programming issues,such as optimization of caches and architecture-dependent instruction sets.The present work reports some of the experiences that we gained with experiments with both Just in Time(JIT)JVM code and native code.We compare some aspects of Scala and C++that concern the requirements of scientific computing and highlight some strong features of the Scala language that facilitate the implementation of scientific scripting.This paper describes how ScalaLab tries to combine the best features of the JVM with those of the C/C++technology,in order to implement an effective scientific computing environment.