RiceDB, a web-based integrated database to annotate rice microarray in various biological contexts was developed. It is composed of eight modules. RiceMap module archives the process of Affymetrix probe sets mapping t...RiceDB, a web-based integrated database to annotate rice microarray in various biological contexts was developed. It is composed of eight modules. RiceMap module archives the process of Affymetrix probe sets mapping to different databases about rice, and aims to the genes represented by a microarray set by retrieving annotation information via the identifier or accession number of every database; RiceGO module indicates the association between a microarray set and gene ontology (GO) categories; RiceKO module is used to annotate a microarray set based on the KEGG biochemical pathways; RiceDO module indicates the information of domain associated with a microarray set; RiceUP module is used to obtain promoter sequences for all genes represented by a microarray set; RiceMR module lists potential microRNA which regulated the genes represented by a microarray set; RiceCD and RiceGF are used to annotate the genes represented by a microarray set in the context of chromosome distribution and rice paralogous family distribution. The results of automatic annotation are mostly consistent with manual annotation. Biological interpretation of the microarray data is quickened by the help of RiceDB.展开更多
Electronic specialty gases play vital roles in key chip manufacturing processes like lithography,etching,deposition and cleaning.While their ultra-high purity(≥99.999%)creates challenging separation requirements,insu...Electronic specialty gases play vital roles in key chip manufacturing processes like lithography,etching,deposition and cleaning.While their ultra-high purity(≥99.999%)creates challenging separation requirements,insufficientphysicochemical data has hindered adsorbent development.To bridge this gap,we constructed a multidimensional database covering 101 semiconductor-related molecules with 19 physical parameters,and developed a Bayesian regression-based collaborative prediction model demonstrating high accuracy(R^(2)=0.95-0.97)on test sets.We further constructed the balanced dataaugmented Transformer-based molecular property prediction(BD-TMPP)model to address the overfittingproblem in small-sample learning.This model achieves the end-to-end prediction of molecular quadrupole moment(R^(2)=0.99),and polarizability(R^(2)=0.98)via the capture of interatomic spatial correlations.Compared with traditional density functional theory calculations,the model achieves a five-orders-of-magnitude improvement in computational efficiency while maintaining accuracy,demonstrating a successful application of the"structure-property relationship"theory in chemical machine learning.展开更多
基金supported by the National Key Basic Research and Development Program of China(Grant No.2005CB120900)the National Natural Science Foundation of China(Grant No.30500106)the Scientific Research Foundation for Returned Overseas Chinese Scholars,Ministry of Education,and the Department of Science and Technology of Zhejiang Province,China(Grant No.2007C22025).
文摘RiceDB, a web-based integrated database to annotate rice microarray in various biological contexts was developed. It is composed of eight modules. RiceMap module archives the process of Affymetrix probe sets mapping to different databases about rice, and aims to the genes represented by a microarray set by retrieving annotation information via the identifier or accession number of every database; RiceGO module indicates the association between a microarray set and gene ontology (GO) categories; RiceKO module is used to annotate a microarray set based on the KEGG biochemical pathways; RiceDO module indicates the information of domain associated with a microarray set; RiceUP module is used to obtain promoter sequences for all genes represented by a microarray set; RiceMR module lists potential microRNA which regulated the genes represented by a microarray set; RiceCD and RiceGF are used to annotate the genes represented by a microarray set in the context of chromosome distribution and rice paralogous family distribution. The results of automatic annotation are mostly consistent with manual annotation. Biological interpretation of the microarray data is quickened by the help of RiceDB.
基金the support from the National Natural Science Foundation of China(U24A20532 and 22278146)Guangdong Basic and Applied Basic Research Team Fund(2024B1515040016)Fundamental Research Funds for the Central Universities.
文摘Electronic specialty gases play vital roles in key chip manufacturing processes like lithography,etching,deposition and cleaning.While their ultra-high purity(≥99.999%)creates challenging separation requirements,insufficientphysicochemical data has hindered adsorbent development.To bridge this gap,we constructed a multidimensional database covering 101 semiconductor-related molecules with 19 physical parameters,and developed a Bayesian regression-based collaborative prediction model demonstrating high accuracy(R^(2)=0.95-0.97)on test sets.We further constructed the balanced dataaugmented Transformer-based molecular property prediction(BD-TMPP)model to address the overfittingproblem in small-sample learning.This model achieves the end-to-end prediction of molecular quadrupole moment(R^(2)=0.99),and polarizability(R^(2)=0.98)via the capture of interatomic spatial correlations.Compared with traditional density functional theory calculations,the model achieves a five-orders-of-magnitude improvement in computational efficiency while maintaining accuracy,demonstrating a successful application of the"structure-property relationship"theory in chemical machine learning.