Large models,exemplified by ChatGPT,have reached the pinnacle of contemporary artificial intelligence(AI).However,they are plagued by three inherent drawbacks:excessive training data and computing power consumption,su...Large models,exemplified by ChatGPT,have reached the pinnacle of contemporary artificial intelligence(AI).However,they are plagued by three inherent drawbacks:excessive training data and computing power consumption,susceptibility to catastrophic forgetting,and a deficiency in logical reasoning capabilities within black-box models.To address these challenges,we draw insights from human memory mechanisms to introduce“machine memory,”which we define as a storage structure formed by encoding external information into a machine-representable and computable format.Centered on machine memory,we propose the brand-new machine memory intelligence(M^(2)I)framework,which encompasses representation,learning,and reasoning modules and loops.We explore the key issues and recent advances in the four core aspects of M^(2)I,including neural mechanisms,associative representation,continual learning,and collaborative reasoning within machine memory.M^(2)I aims to liberate machine intelligence from the confines of data-centric neural networks and fundamentally break through the limitations of existing large models,driving a qualitative leap from weak to strong AI.展开更多
Structural variations(SVs),especially presence–absence variations(PAVs),are crucial in crop domestication and trait improvement.Although pan-genome analysis provides an exhaustive view of PAVs,it is often limited by ...Structural variations(SVs),especially presence–absence variations(PAVs),are crucial in crop domestication and trait improvement.Although pan-genome analysis provides an exhaustive view of PAVs,it is often limited by high costs and restricted sample sizes.In contrast,genome-wide association studies(GWASs)can effectively identify trait–marker associations in large populations but typically overlook PAVs and face challenges in distinguishing causal variants due to linkage disequilibrium.In this study,we performed de novo assembly of eight reference-quality foxtail millet(Setaria italica)genomes and constructed a graph-based pan-genome to systematically explore PAVs.We subsequently performed a GWAS with 344 millet accessions,targeting genomic regions associated with the color of the leaf,leaf sheath,and leaf pulvinus.Using interpretable machine-learning models,we identified large-effect variants in the 26.84–26.94 Mb interval on chromosome 7,including a 5002-bp Copia element insertion and other key variants associated with phenotypic variations in leaf color traits.This integrative approach combines the detailed variant-detection capabilities of pan-genome analysis with the large-scale mapping potential of GWASs and enhances variant prioritization using interpretable machine learning,providing a cost-efficient yet effective framework for studying agronomic traits in crops.展开更多
Multi-label learning is an active research area which plays an important role in machine learning. Traditional learning algorithms, however, have to depend on samples with complete labels. The existing learning algori...Multi-label learning is an active research area which plays an important role in machine learning. Traditional learning algorithms, however, have to depend on samples with complete labels. The existing learning algorithms with missing labels do not consider the relevance of labels, resulting in label estimation errors of new samples. A new multi-label learning algorithm with support vector machine(SVM) based association(SVMA) is proposed to estimate missing labels by constructing the association between different labels. SVMA will establish a mapping function to minimize the number of samples in the margin while ensuring the margin large enough as well as minimizing the misclassification probability. To evaluate the performance of SVMA in the condition of missing labels, four typical data sets are adopted with the integrity of the labels being handled manually. Simulation results show the superiority of SVMA in dealing with the samples with missing labels compared with other models in image classification.展开更多
The traditional method of screening plants for disease resistance phenotype is both time-consuming and costly.Genomic selection offers a potential solution to improve efficiency,but accurately predicting plant disease...The traditional method of screening plants for disease resistance phenotype is both time-consuming and costly.Genomic selection offers a potential solution to improve efficiency,but accurately predicting plant disease resistance remains a challenge.In this study,we evaluated eight different machine learning(ML)methods,including random forest classification(RFC),support vector classifier(SVC),light gradient boosting machine(lightGBM),random forest classification plus kinship(RFC_K),support vector classification plus kinship(SVC_K),light gradient boosting machine plus kinship(lightGBM_K),deep neural network genomic prediction(DNNGP),and densely connected convolutional networks(DenseNet),for predicting plant disease resistance.Our results demonstrate that the three plus kinship(K)methods developed in this study achieved high prediction accuracy.Specifically,these methods achieved accuracies of up to 95%for rice blast(RB),85%for rice black-streaked dwarf virus(RBSDV),and 85%for rice sheath blight(RSB)when trained and applied to the rice diversity panel I(RDPI).Furthermore,the plus K models performed well in predicting wheat blast(WB)and wheat stripe rust(WSR)diseases,with mean accuracies of up to 90%and 93%,respectively.To assess the generalizability of our models,we applied the trained plus K methods to predict RB disease resistance in an independent population,rice diversity panel II(RDPII).Concurrently,we evaluated the RB resistance of RDPII cultivars using spray inoculation.Comparing the predictions with the spray inoculation results,we found that the accuracy of the plus K methods reached 91%.These findings highlight the effectiveness of the plus K methods(RFC_K,SVC_K,and lightGBM_K)in accurately predicting plant disease resistance for RB,RBSDV,RSB,WB,and WSR.The methods developed in this study not only provide valuable strategies for predicting disease resistance,but also pave the way for using machine learning to streamline genome-based crop breeding.展开更多
Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease re...Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease related gene.In pharmacogenomics research,identifying the association between SNP site and drug is the key to clinical precision medication,therefore,a predictive model of SNP site and drug association based on denoising variational auto-encoder(DVAE-SVM)is proposed.Firstly,k-mer algorithm is used to construct the initial SNP site feature vector,meanwhile,MACCS molecular fingerprint is introduced to generate the feature vector of the drug module.Then,we use the DVAE to extract the effective features of the initial feature vector of the SNP site.Finally,the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines(SVM)to predict the relationship of SNP site and drug module.The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest(RF)and logistic regression(LR)classification.Further experiments show that compared with the feature extraction algorithms of principal component analysis(PCA),denoising auto-encoder(DAE)and variational auto-encode(VAE),the proposed algorithm has better prediction results.展开更多
Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only f...Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.展开更多
The rapid growth of the use of social media opens up new challenges and opportunities to analyze various aspects and patterns in communication.In-text mining,several techniques are available such as information cluste...The rapid growth of the use of social media opens up new challenges and opportunities to analyze various aspects and patterns in communication.In-text mining,several techniques are available such as information clustering,extraction,summarization,classification.In this study,a text mining framework was presented which consists of 4 phases retrieving,processing,indexing,and mine association rule phase.It is applied by using the association rule mining technique to check the associated term with the Huawei P30 Pro phone.Customer reviews are extracted from many websites and Facebook groups,such as re-view.cnet.com,CNET.Facebook and amazon.com technology,where customers from all over the world placed their notes on cell phones.In this analysis,a total of 192 reviews of Huawei P30 Pro were collected to evaluate them by text mining techniques.The findings demonstrate that Huawei P30 Pro,has strong points such as the best safety,high-quality camera,battery that lasts more than 24 hours,and the processor is very fast.This paper aims to prove that text mining decreases human efforts by recognizing significant documents.This will lead to improving the awareness of customers to choose their products and at the same time sales managers also get to know what their products were accepted by customers suspended.展开更多
In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM...In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
基金supported by the National Natural Science Foun-dation of China(62137002,62250009,62202367,82025020,and 82230072).
文摘Large models,exemplified by ChatGPT,have reached the pinnacle of contemporary artificial intelligence(AI).However,they are plagued by three inherent drawbacks:excessive training data and computing power consumption,susceptibility to catastrophic forgetting,and a deficiency in logical reasoning capabilities within black-box models.To address these challenges,we draw insights from human memory mechanisms to introduce“machine memory,”which we define as a storage structure formed by encoding external information into a machine-representable and computable format.Centered on machine memory,we propose the brand-new machine memory intelligence(M^(2)I)framework,which encompasses representation,learning,and reasoning modules and loops.We explore the key issues and recent advances in the four core aspects of M^(2)I,including neural mechanisms,associative representation,continual learning,and collaborative reasoning within machine memory.M^(2)I aims to liberate machine intelligence from the confines of data-centric neural networks and fundamentally break through the limitations of existing large models,driving a qualitative leap from weak to strong AI.
基金supported by the National Key Research and Development Program of China(2023YFF1000100)the Hebei Provincial Science and Technology Plan Modern Breeding Industry Science and Technology Innovation Special Project(21326316D and 21326302D)+1 种基金the Guiding Special Fund for Central Universities to Build World-Class Universities and Promote Characteristic Development(2025AC030)the Pinduoduo-China Agricultural University Research Fund(PC2024A01003).
文摘Structural variations(SVs),especially presence–absence variations(PAVs),are crucial in crop domestication and trait improvement.Although pan-genome analysis provides an exhaustive view of PAVs,it is often limited by high costs and restricted sample sizes.In contrast,genome-wide association studies(GWASs)can effectively identify trait–marker associations in large populations but typically overlook PAVs and face challenges in distinguishing causal variants due to linkage disequilibrium.In this study,we performed de novo assembly of eight reference-quality foxtail millet(Setaria italica)genomes and constructed a graph-based pan-genome to systematically explore PAVs.We subsequently performed a GWAS with 344 millet accessions,targeting genomic regions associated with the color of the leaf,leaf sheath,and leaf pulvinus.Using interpretable machine-learning models,we identified large-effect variants in the 26.84–26.94 Mb interval on chromosome 7,including a 5002-bp Copia element insertion and other key variants associated with phenotypic variations in leaf color traits.This integrative approach combines the detailed variant-detection capabilities of pan-genome analysis with the large-scale mapping potential of GWASs and enhances variant prioritization using interpretable machine learning,providing a cost-efficient yet effective framework for studying agronomic traits in crops.
基金Support by the National High Technology Research and Development Program of China(No.2012AA120802)National Natural Science Foundation of China(No.61771186)+1 种基金Postdoctoral Research Project of Heilongjiang Province(No.LBH-Q15121)Undergraduate University Project of Young Scientist Creative Talent of Heilongjiang Province(No.UNPYSCT-2017125)
文摘Multi-label learning is an active research area which plays an important role in machine learning. Traditional learning algorithms, however, have to depend on samples with complete labels. The existing learning algorithms with missing labels do not consider the relevance of labels, resulting in label estimation errors of new samples. A new multi-label learning algorithm with support vector machine(SVM) based association(SVMA) is proposed to estimate missing labels by constructing the association between different labels. SVMA will establish a mapping function to minimize the number of samples in the margin while ensuring the margin large enough as well as minimizing the misclassification probability. To evaluate the performance of SVMA in the condition of missing labels, four typical data sets are adopted with the integrity of the labels being handled manually. Simulation results show the superiority of SVMA in dealing with the samples with missing labels compared with other models in image classification.
基金supported by the National Natural Science Foundation of China(32261143468)the National Key Research and Development(R&D)Program of China(2021YFC2600400)+1 种基金the Seed Industry Revitalization Project of Jiangsu Province(JBGS(2021)001)the Project of Zhongshan Biological Breeding Laboratory(BM2022008-02)。
文摘The traditional method of screening plants for disease resistance phenotype is both time-consuming and costly.Genomic selection offers a potential solution to improve efficiency,but accurately predicting plant disease resistance remains a challenge.In this study,we evaluated eight different machine learning(ML)methods,including random forest classification(RFC),support vector classifier(SVC),light gradient boosting machine(lightGBM),random forest classification plus kinship(RFC_K),support vector classification plus kinship(SVC_K),light gradient boosting machine plus kinship(lightGBM_K),deep neural network genomic prediction(DNNGP),and densely connected convolutional networks(DenseNet),for predicting plant disease resistance.Our results demonstrate that the three plus kinship(K)methods developed in this study achieved high prediction accuracy.Specifically,these methods achieved accuracies of up to 95%for rice blast(RB),85%for rice black-streaked dwarf virus(RBSDV),and 85%for rice sheath blight(RSB)when trained and applied to the rice diversity panel I(RDPI).Furthermore,the plus K models performed well in predicting wheat blast(WB)and wheat stripe rust(WSR)diseases,with mean accuracies of up to 90%and 93%,respectively.To assess the generalizability of our models,we applied the trained plus K methods to predict RB disease resistance in an independent population,rice diversity panel II(RDPII).Concurrently,we evaluated the RB resistance of RDPII cultivars using spray inoculation.Comparing the predictions with the spray inoculation results,we found that the accuracy of the plus K methods reached 91%.These findings highlight the effectiveness of the plus K methods(RFC_K,SVC_K,and lightGBM_K)in accurately predicting plant disease resistance for RB,RBSDV,RSB,WB,and WSR.The methods developed in this study not only provide valuable strategies for predicting disease resistance,but also pave the way for using machine learning to streamline genome-based crop breeding.
基金Lanzhou Talent Innovation and Entrepreneurship Project(No.2020-RC-14)。
文摘Single nucletide polymorphism(SNP)is an important factor for the study of genetic variation in human families and animal and plant strains.Therefore,it is widely used in the study of population genetics and disease related gene.In pharmacogenomics research,identifying the association between SNP site and drug is the key to clinical precision medication,therefore,a predictive model of SNP site and drug association based on denoising variational auto-encoder(DVAE-SVM)is proposed.Firstly,k-mer algorithm is used to construct the initial SNP site feature vector,meanwhile,MACCS molecular fingerprint is introduced to generate the feature vector of the drug module.Then,we use the DVAE to extract the effective features of the initial feature vector of the SNP site.Finally,the effective feature vector of the SNP site and the feature vector of the drug module are fused input to the support vector machines(SVM)to predict the relationship of SNP site and drug module.The results of five-fold cross-validation experiments indicate that the proposed algorithm performs better than random forest(RF)and logistic regression(LR)classification.Further experiments show that compared with the feature extraction algorithms of principal component analysis(PCA),denoising auto-encoder(DAE)and variational auto-encode(VAE),the proposed algorithm has better prediction results.
文摘Association rule learning(ARL)is a widely used technique for discovering relationships within datasets.However,it often generates excessive irrelevant or ambiguous rules.Therefore,post-processing is crucial not only for removing irrelevant or redundant rules but also for uncovering hidden associations that impact other factors.Recently,several post-processing methods have been proposed,each with its own strengths and weaknesses.In this paper,we propose THAPE(Tunable Hybrid Associative Predictive Engine),which combines descriptive and predictive techniques.By leveraging both techniques,our aim is to enhance the quality of analyzing generated rules.This includes removing irrelevant or redundant rules,uncovering interesting and useful rules,exploring hidden association rules that may affect other factors,and providing backtracking ability for a given product.The proposed approach offers a tailored method that suits specific goals for retailers,enabling them to gain a better understanding of customer behavior based on factual transactions in the target market.We applied THAPE to a real dataset as a case study in this paper to demonstrate its effectiveness.Through this application,we successfully mined a concise set of highly interesting and useful association rules.Out of the 11,265 rules generated,we identified 125 rules that are particularly relevant to the business context.These identified rules significantly improve the interpretability and usefulness of association rules for decision-making purposes.
文摘The rapid growth of the use of social media opens up new challenges and opportunities to analyze various aspects and patterns in communication.In-text mining,several techniques are available such as information clustering,extraction,summarization,classification.In this study,a text mining framework was presented which consists of 4 phases retrieving,processing,indexing,and mine association rule phase.It is applied by using the association rule mining technique to check the associated term with the Huawei P30 Pro phone.Customer reviews are extracted from many websites and Facebook groups,such as re-view.cnet.com,CNET.Facebook and amazon.com technology,where customers from all over the world placed their notes on cell phones.In this analysis,a total of 192 reviews of Huawei P30 Pro were collected to evaluate them by text mining techniques.The findings demonstrate that Huawei P30 Pro,has strong points such as the best safety,high-quality camera,battery that lasts more than 24 hours,and the processor is very fast.This paper aims to prove that text mining decreases human efforts by recognizing significant documents.This will lead to improving the awareness of customers to choose their products and at the same time sales managers also get to know what their products were accepted by customers suspended.
基金Supported by the National High Technology Research and Development Program of China (No. 2007AA01Z132) the National Natural Science Foundation of China (No.60775035, 60933004, 60970088, 60903141)+1 种基金 the National Basic Research Priorities Programme (No. 2007CB311004) the National Science and Technology Support Plan (No.2006BAC08B06).
文摘In this paper, we propose an enhanced associative classification method by integrating the dynamic property in the process of associative classification. In the proposed method, we employ a support vector machine(SVM) based method to refine the discovered emerging ~equent patterns for classification rule extension for class label prediction. The empirical study shows that our method can be used to classify increasing resources efficiently and effectively.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.