Enzymatic reactions take place with high chemo-, regio-, and stereo-selectivity, appealing for the direct functionalization of abundant and inexpensive compounds with C-H bonds to make fine chemicals such as high-valu...Enzymatic reactions take place with high chemo-, regio-, and stereo-selectivity, appealing for the direct functionalization of abundant and inexpensive compounds with C-H bonds to make fine chemicals such as high-value intermediates and pharmaceuticals. This review summarizes recent progress in the enzymatic functionalization of C-H bonds with an emphasis on heme enzymes such as cytochrome P450 s, chloroperoxidase and unspecific peroxygenases. Specific examples are discussed to elucidate the applications of the molecular and process engineering approaches to overcome the challenges hindering enzymatic C-H functionalization. Also discussed is the recent development of the chemo-enzymatic cascade as an effective way to integrate the power of metal catalysis and enzymatic catalysis for C-H functionalization.展开更多
Recent advancements in data technology offer immense opportunities for the discovery and development of new enzymes for the green synthesis of chemicals.Current protein databases predominantly prioritize overall seque...Recent advancements in data technology offer immense opportunities for the discovery and development of new enzymes for the green synthesis of chemicals.Current protein databases predominantly prioritize overall sequence matches.The multi-scale features underpinning catalytic mechanisms and processes,which are scat-tered across various data sources,have not been sufficiently integrated to be effectively utilized in enzyme mining.In this study,we developed a sequence-and taxonomic-feature evaluation driven workflow to discover enzymes that can be expressed in E.coli and catalyze chemical reactions in vitro,using alcohol oxidase(AOX)for demonstration,which catalyzes the conversion of methanol to formaldehyde.A dataset of 21 reported AOXs was used to construct sequence scoring rules based on features,including sequence length,structural motifs,catalytic-related residues,binding residues,and overall structure.These scoring rules were applied to filter the results from HMM-based searches,yielding 357 candidate sequences of eukaryotic origin,which were catego-rized into six classes at 85%sequence similarity.Experimental validation was conducted in two rounds on 31 selected sequences representing all classes.Among these selected sequences,19 were expressed as soluble proteins in E.coli,and 18 of these soluble proteins exhibited AOX activity,as predicted.Notably,the most active recombinant AOX exhibited an activity of 8.65±0.29 U/mg,approaching the highest activity of native eukaryotic enzymes.Compared to the established UniProt-annotation-based workflow,this feature-evaluation-based approach yielded a higher probability of highly active recombinant AOX(from 8.3%to 19.4%),demonstrating the efficiency and potential of this multi-dimensional feature evaluation method in accelerating the discovery of active enzymes.展开更多
Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the...Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction.We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation.The influence of protein and chemical descriptors was assessed in three scenarios,which were predicting the activity of unknown relations between known enzymes and known chemicals(new relationship evaluation),predicting the activity of novel enzymes on known chemicals(new enzyme evaluation),and predicting the activity of new chemicals on known enzymes(new chemical evaluation).The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes,whereas chemical descriptors appear no effect.A variety of sequence-based and structure-based protein descriptors were constructed,among which the esm-2 descriptor achieved the best results.Using enzyme families as labels showed that descriptors could cluster proteins well,which could explain the contributions of descriptors to the machine learning model.As a counterpart,in the new chemical evaluation,chemical descriptors made significant improvement in four out of the seven datasets,while protein descriptors appear no effect.We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models.The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy.This work provides guidance for the development of machine learning models for specific enzyme families.展开更多
基金Supported by the National Natural Science Foundation of China(No.21676157 and No.21520102008)。
文摘Enzymatic reactions take place with high chemo-, regio-, and stereo-selectivity, appealing for the direct functionalization of abundant and inexpensive compounds with C-H bonds to make fine chemicals such as high-value intermediates and pharmaceuticals. This review summarizes recent progress in the enzymatic functionalization of C-H bonds with an emphasis on heme enzymes such as cytochrome P450 s, chloroperoxidase and unspecific peroxygenases. Specific examples are discussed to elucidate the applications of the molecular and process engineering approaches to overcome the challenges hindering enzymatic C-H functionalization. Also discussed is the recent development of the chemo-enzymatic cascade as an effective way to integrate the power of metal catalysis and enzymatic catalysis for C-H functionalization.
基金supported by National Key Research and Development Program of China(no.2022YFC2105900).
文摘Recent advancements in data technology offer immense opportunities for the discovery and development of new enzymes for the green synthesis of chemicals.Current protein databases predominantly prioritize overall sequence matches.The multi-scale features underpinning catalytic mechanisms and processes,which are scat-tered across various data sources,have not been sufficiently integrated to be effectively utilized in enzyme mining.In this study,we developed a sequence-and taxonomic-feature evaluation driven workflow to discover enzymes that can be expressed in E.coli and catalyze chemical reactions in vitro,using alcohol oxidase(AOX)for demonstration,which catalyzes the conversion of methanol to formaldehyde.A dataset of 21 reported AOXs was used to construct sequence scoring rules based on features,including sequence length,structural motifs,catalytic-related residues,binding residues,and overall structure.These scoring rules were applied to filter the results from HMM-based searches,yielding 357 candidate sequences of eukaryotic origin,which were catego-rized into six classes at 85%sequence similarity.Experimental validation was conducted in two rounds on 31 selected sequences representing all classes.Among these selected sequences,19 were expressed as soluble proteins in E.coli,and 18 of these soluble proteins exhibited AOX activity,as predicted.Notably,the most active recombinant AOX exhibited an activity of 8.65±0.29 U/mg,approaching the highest activity of native eukaryotic enzymes.Compared to the established UniProt-annotation-based workflow,this feature-evaluation-based approach yielded a higher probability of highly active recombinant AOX(from 8.3%to 19.4%),demonstrating the efficiency and potential of this multi-dimensional feature evaluation method in accelerating the discovery of active enzymes.
基金This work is supported by National Key Research and Development Program of China(no.2022YFC2105900).
文摘Descriptors play a pivotal role in enzyme design for the greener synthesis of biochemicals,as they could characterize enzymes and chemicals from the physicochemical and evolutionary perspective.This study examined the effects of various descriptors on the performance of Random Forest model used for enzyme-chemical relationships prediction.We curated activity data of seven specific enzyme families from the literature and developed the pipeline for evaluation the machine learning model performance using 10-fold cross-validation.The influence of protein and chemical descriptors was assessed in three scenarios,which were predicting the activity of unknown relations between known enzymes and known chemicals(new relationship evaluation),predicting the activity of novel enzymes on known chemicals(new enzyme evaluation),and predicting the activity of new chemicals on known enzymes(new chemical evaluation).The results showed that protein descriptors significantly enhanced the classification performance of model on new enzyme evaluation in three out of the seven datasets with the greatest number of enzymes,whereas chemical descriptors appear no effect.A variety of sequence-based and structure-based protein descriptors were constructed,among which the esm-2 descriptor achieved the best results.Using enzyme families as labels showed that descriptors could cluster proteins well,which could explain the contributions of descriptors to the machine learning model.As a counterpart,in the new chemical evaluation,chemical descriptors made significant improvement in four out of the seven datasets,while protein descriptors appear no effect.We attempted to evaluate the generalization ability of the model by correlating the statistics of the datasets with the performance of the models.The results showed that datasets with higher sequence similarity were more likely to get better results in the new enzyme evaluation and datasets with more enzymes were more likely beneficial from the protein descriptor strategy.This work provides guidance for the development of machine learning models for specific enzyme families.