Background:Whole-slide image(WSI)is foundational for artificial intelligence in tumor diagnosis,treatment planning,and prognosis prediction.Efficient management of WSI labels is crucial for clinical digitalization;how...Background:Whole-slide image(WSI)is foundational for artificial intelligence in tumor diagnosis,treatment planning,and prognosis prediction.Efficient management of WSI labels is crucial for clinical digitalization;however,manual or semiautomatic methods limit scalability.Enhancing automatic pathological label recognition is critical to advancing digital pathology,improving efficiency,and drive precision oncology.Methods:We developed Auto LDP,a method for automatic labeling of digital pathology,which identifies textual information used for labeling slides.The method includes four steps:identifying text position using the CRAFT model,recognizing text content using the ParSeq model,identifying slice type using the ConvNext classifier,and combining relevant information to generate a new name.The naming format is divided into four parts:pathology ID,wax block ID,staining type,and slice type.We used the accuracy and processing time to validate our method using two validation sets.Results:The AutoLDP system was 20 times faster than manual labeling.The files per minute in the solid-state drives of CRAFT t ParSeq were the highest among all methods at 136.95 in validation set 1 and 170.95 in validation set 2.We compared the proposed model with several commonly used text detection and recognition models including ABinet,CRNN,TRBA,and Vitstr.The results show that we achieved an accuracy of 97.60%in just 87.62 s in validation set 1 with 200 cases,which was significantly better than that of the other models.In addition,the accuracy reached 96.98% in validation set 2 with 13,667 cases,confirming the generalization ability of the model.Conclusion:In this study,we proposed a new model,AutoLDP,automates the extraction and recognition of key information from WSI,enabling standardized naming,and significantly improving labeling efficiency.This innovation supports the digital transformation of pathology and advances precision medicine.展开更多
Deep Learning(DL)is a subfield of machine learning that significantly impacts extracting new knowledge.By using DL,the extraction of advanced data representations and knowledge can be made possible.Highly effective DL...Deep Learning(DL)is a subfield of machine learning that significantly impacts extracting new knowledge.By using DL,the extraction of advanced data representations and knowledge can be made possible.Highly effective DL techniques help to find more hidden knowledge.Deep learning has a promising future due to its great performance and accuracy.We need to understand the fundamentals and the state‐of‐the‐art of DL to leverage it effectively.A survey on DL ways,advantages,drawbacks,architectures,and methods to have a straightforward and clear understanding of it from different views is explained in the paper.Moreover,the existing related methods are compared with each other,and the application of DL is described in some applications,such as medical image analysis,handwriting recognition,and so on.展开更多
COVID-19 has caused severe health complications and produced a substantial adverse economic impact around the world.Forecasting the trend of COVID-19 infections could help in executing policies to effectively reduce t...COVID-19 has caused severe health complications and produced a substantial adverse economic impact around the world.Forecasting the trend of COVID-19 infections could help in executing policies to effectively reduce the number of new cases.In this study,we apply the decomposition and ensemble model to forecast COVID-19 confirmed cases,deaths,and recoveries in Pakistan for the upcoming month until the end of July.For the decomposition of data,the Ensemble Empirical Mode Decomposition(EEMD)technique is applied.EEMD decomposes the data into small components,called Intrinsic Mode Functions(IMFs).For individual IMFs modelling,we use the Autoregressive Integrated Moving Average(ARIMA)model.The data used in this study is obtained from the official website of Pakistan that is publicly available and designated for COVID-19 outbreak with daily updates.Our analyses reveal that the number of recoveries,new cases,and deaths are increasing in Pakistan exponentially.Based on the selected EEMD-ARIMA model,the new confirmed cases are expected to rise from 213,470 to 311,454 by 31 July 2020,which is an increase of almost 1.46 times with a 95%prediction interval of 246,529 to 376,379.The 95%prediction interval for recovery is 162,414 to 224,579,with an increase of almost two times in total from 100802 to 193495 by 31 July 2020.On the other hand,the deaths are expected to increase from 4395 to 6751,which is almost 1.54 times,with a 95%prediction interval of 5617 to 7885.Thus,the COVID-19 forecasting results of Pakistan are alarming for the next month until 31 July 2020.They also confirm that the EEMD-ARIMA model is useful for the short-term forecasting of COVID-19,and that it is capable of keeping track of the real COVID-19 data in nearly all scenarios.The decomposition and ensemble strategy can be useful to help decision-makers in developing short-term strategies about the current number of disease occurrences until an appropriate vaccine is developed.展开更多
Dynamic geometry software, as a piece of computer-assisted instruction(CAI) software, is closely and deeply associated with mathematics, and is widely applied to mathematics teaching activities in primary and secondar...Dynamic geometry software, as a piece of computer-assisted instruction(CAI) software, is closely and deeply associated with mathematics, and is widely applied to mathematics teaching activities in primary and secondary schools. Meanwhile, web technology also has become an important technology for assisting education and teaching. This paper expounds a web-based dynamic geometry software development process, and analyses specific requirements regarding graphical application programming interface(API) required by dynamic geometry software. With experiments and comparison on the two different hypertext markup language(HTML)5 graphical API technologies, i.e., scalable vector graphics(SVG) and Canvas, on different apparatuses and browsers, we draw the conclusion that it is more suitable to adopt Canvas as the graphical API technology for the web-based dynamic geometry software, thus further proposed the principles and methods for an object-oriented Canvas design. The dynamic geometry software based on the newly-designed Canvas has technical advantages and educational value, well incorporating aesthetic education into mathematics education.展开更多
Background:Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome.SARS-CoV-2 with potential origin of bat is still circulating in China.In this study,a prediction model is propo...Background:Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome.SARS-CoV-2 with potential origin of bat is still circulating in China.In this study,a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning.Methods:The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource(2019nCoVR)Database of China National Genomics Data Center on Jan 29,2020.A total of 507 human-origin viruses were regarded as positive samples,whereas 2159 non-human-origin viruses were regarded as negative.To capture the key information of the spike protein,three feature encoding algorithms(amino acid composition,AAC;parallel correlation-based pseudo-amino-acid composition,PC-PseAAC and G-gap dipeptide composition,GGAP)were used to train 41 random forest models.The optimal feature with the best performance was identified by the multidimensional scaling method,which was used to explore the pattern of human coronavirus.Results:The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP(g=3)feature.The predictive model achieved the maximum ACC of 98.18%coupled with the Matthews correlation coefficient(MCC)of 0.9638.Seven clusters for human coronaviruses(229E,NL63,OC43,HKU1,MERS-CoV,SARS-CoV,and SARS-CoV-2)were found.The cluster for SARS-CoV-2 was very close to that for SARS-CoV,which suggests that both of viruses have the same human receptor(angiotensin converting enzyme II).The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously.The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual.Conclusions:The optimal feature(GGAP,g=3)performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple,fast and large-scale manner.The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.展开更多
Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spat...Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spatial distribution of CD3^(+)and CD8^(+)T cells within the tumor microenvironment has been demonstrated to have a significant impact on the prognosis of colorectal cancer(CRC).This study aimed to investigate CD3_(CT)(CD3^(+)T cells density in the core of the tumor[CT])prognostic ability in patients with CRC by using AI technology.Methods:The study involved the enrollment of 492 patients from two distinct medical centers,with 358 patients assigned to the training cohort and an additional 134 patients allocated to the validation cohort.To facilitate tissue segmentation and T-cells quantification in whole-slide images(WSIs),a fully automated workflow based on deep learning was devised.Upon the completion of tissue segmentation and subsequent cell segmentation,a comprehensive analysis was conducted.Results:The evaluation of various positive T cell densities revealed comparable discriminatory ability between CD3_(CT) and CD3-CD8(the combination of CD3^(+)and CD8^(+)T cells density within the CT and invasive margin)in predicting mortality(C-index in training cohort:0.65 vs.0.64;validation cohort:0.69 vs.0.69).The CD3_(CT) was confirmed as an independent prognostic factor,with high CD3_(CT) density associated with increased overall survival(OS)in the training cohort(hazard ratio[HR]=0.22,95%confidence interval[CI]:0.12–0.38,P<0.001)and validation cohort(HR=0.21,95%CI:0.05–0.92,P=0.037).Conclusions:We quantify the spatial distribution of CD3^(+)and CD8^(+)T cells within tissue regions in WSIs using AI technology.The CD3_(CT) confirmed as a stage-independent predictor for OS in CRC patients.Moreover,CD3_(CT) shows promise in simplifying the CD3-CD8 system and facilitating its practical application in clinical settings.展开更多
Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep lear...Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes.Methods:A total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library.We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution.The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk.The best performances were explored with the use of pre-trained DNA vector and attention mechanism.The area under the receiver operating characteristic curve(AUROC)and the area under precision-recall curve(AUPR)were used to evaluate the predictive models.Results:The six specifc models achieved good performances for the corresponding virus groups(1 for AUROC and 1 for AUPR).The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups(1 for AUROC and 1 for AUPR)while those without pre-training vector or attention mechanism had obvi‑ously reduction of performance(about 5–25%).Re-training experiments showed that the general model has good capabilities of transfer learning(average for six groups:0.968 for AUROC and 0.942 for AUPR)and should give reason‑able prediction for potential pathogen of next pandemic.The artifcial negative data with the replacement of the coding region of the spike protein were also predicted correctly(100%accuracy).With the application of the Python programming language,an easy-to-use tool was created to implements our predictor.Conclusions:Robust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic.展开更多
A specialized computer named as the Electronic Probe Computer(EPC)has been developed to address large-scale NP-complete problems.The EPC employs a hybrid serial/parallel computational model,structured around four main...A specialized computer named as the Electronic Probe Computer(EPC)has been developed to address large-scale NP-complete problems.The EPC employs a hybrid serial/parallel computational model,structured around four main subsystems:a converting system,an input/output system,and an operating system.The converting system is a software component that transforms the target problem into the graph coloring problem,while the operating system is designed to solve these graph coloring challenges.Comprised of 60 probe computing cards,this system is referred to as EPC60.In tackling large-scale graph coloring problems with EPC60,1003-colorable graphs were randomly selected,each consisting of 2,000 vertices.The state-of-the-art mathematical optimization solver achieved a success rate of only 6%,while EPC60 excelled with a remarkable 100%success rate.Additionally,EPC60 successfully solved two 3-colorable graphs with 1,500 and 2,000 vertices,which had eluded Gurobi’s attempts for 15 days on a standard workstation.Given the mutual reducibility of NP-complete problems in polynomial time theoretically,the EPC stands out as a universal solver for NP-complete problem.The EPC can be applied to various problems that can be abstracted as combinatorial optimization issues,making it relevant across multiple domains,including supply chain management,financial services,telecommunications,energy systems,manufacturing,and beyond.展开更多
Background:Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity pheno...Background:Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.Methods:The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.Results:The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.Conclusions:The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.展开更多
A dynamic geometry system,as an important application in the field of geometric constraint solving,is widely used in elementary mathematics education;moreover,the dynamic geometry system is also a fundamental environm...A dynamic geometry system,as an important application in the field of geometric constraint solving,is widely used in elementary mathematics education;moreover,the dynamic geometry system is also a fundamental environment for automated theorem proving in geometry.In a geometric constraint solving process,a situation involving a critical point is often encountered,and geometric element degeneracy may occur at this point.Usually,the degeneracy situation must be substantively focused on during the learning and exploration process.However,many degeneracy situations cannot be completely presented even by the well-known dynamic geometry software.In this paper,the mechanisms causing the degeneracy of a geometric element are analyzed,and relevant definitions and formalized descriptions for the problem are provided according to the relevant modern Euclidean geometry theories.To solve the problem,the data structure is optimized,and a domain model design for the geometric element and the constraint relationships thereof in the dynamic geometry system are formed;furthermore,an update algorithm for the element is proposed based on the novel domain model.In addition,instances show that the proposed domain model and the update algorithm can effectively cope with the geometric element degeneracy situations in the geometric constraint solving process,thereby achieving unification of the dynamic geometry drawing and the geometric intuition of the user.展开更多
基金supported by the National Natural Science Foundation of China(No.82202267,82202095,82372042,82272084)Regional Innovation and Development Joint Fund of National Natural Science Foundation of China(No.U22A20345,U23A20478)+5 种基金National Science Fund for Distinguished Young Scholars of China(No.81925023)Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application(No.2022B1212010011)Guangxi Natural Science Foundation(No.2024GXNSFFA010014)Science and Technology Projects in Guangzhou(No.2024A04J4977)Guangdong Basic and Applied Basic Research Foundation(No.2023A1515011339)High-level Hospital Construction Project(No.DFJHBF202105).
文摘Background:Whole-slide image(WSI)is foundational for artificial intelligence in tumor diagnosis,treatment planning,and prognosis prediction.Efficient management of WSI labels is crucial for clinical digitalization;however,manual or semiautomatic methods limit scalability.Enhancing automatic pathological label recognition is critical to advancing digital pathology,improving efficiency,and drive precision oncology.Methods:We developed Auto LDP,a method for automatic labeling of digital pathology,which identifies textual information used for labeling slides.The method includes four steps:identifying text position using the CRAFT model,recognizing text content using the ParSeq model,identifying slice type using the ConvNext classifier,and combining relevant information to generate a new name.The naming format is divided into four parts:pathology ID,wax block ID,staining type,and slice type.We used the accuracy and processing time to validate our method using two validation sets.Results:The AutoLDP system was 20 times faster than manual labeling.The files per minute in the solid-state drives of CRAFT t ParSeq were the highest among all methods at 136.95 in validation set 1 and 170.95 in validation set 2.We compared the proposed model with several commonly used text detection and recognition models including ABinet,CRNN,TRBA,and Vitstr.The results show that we achieved an accuracy of 97.60%in just 87.62 s in validation set 1 with 200 cases,which was significantly better than that of the other models.In addition,the accuracy reached 96.98% in validation set 2 with 13,667 cases,confirming the generalization ability of the model.Conclusion:In this study,we proposed a new model,AutoLDP,automates the extraction and recognition of key information from WSI,enabling standardized naming,and significantly improving labeling efficiency.This innovation supports the digital transformation of pathology and advances precision medicine.
文摘Deep Learning(DL)is a subfield of machine learning that significantly impacts extracting new knowledge.By using DL,the extraction of advanced data representations and knowledge can be made possible.Highly effective DL techniques help to find more hidden knowledge.Deep learning has a promising future due to its great performance and accuracy.We need to understand the fundamentals and the state‐of‐the‐art of DL to leverage it effectively.A survey on DL ways,advantages,drawbacks,architectures,and methods to have a straightforward and clear understanding of it from different views is explained in the paper.Moreover,the existing related methods are compared with each other,and the application of DL is described in some applications,such as medical image analysis,handwriting recognition,and so on.
文摘COVID-19 has caused severe health complications and produced a substantial adverse economic impact around the world.Forecasting the trend of COVID-19 infections could help in executing policies to effectively reduce the number of new cases.In this study,we apply the decomposition and ensemble model to forecast COVID-19 confirmed cases,deaths,and recoveries in Pakistan for the upcoming month until the end of July.For the decomposition of data,the Ensemble Empirical Mode Decomposition(EEMD)technique is applied.EEMD decomposes the data into small components,called Intrinsic Mode Functions(IMFs).For individual IMFs modelling,we use the Autoregressive Integrated Moving Average(ARIMA)model.The data used in this study is obtained from the official website of Pakistan that is publicly available and designated for COVID-19 outbreak with daily updates.Our analyses reveal that the number of recoveries,new cases,and deaths are increasing in Pakistan exponentially.Based on the selected EEMD-ARIMA model,the new confirmed cases are expected to rise from 213,470 to 311,454 by 31 July 2020,which is an increase of almost 1.46 times with a 95%prediction interval of 246,529 to 376,379.The 95%prediction interval for recovery is 162,414 to 224,579,with an increase of almost two times in total from 100802 to 193495 by 31 July 2020.On the other hand,the deaths are expected to increase from 4395 to 6751,which is almost 1.54 times,with a 95%prediction interval of 5617 to 7885.Thus,the COVID-19 forecasting results of Pakistan are alarming for the next month until 31 July 2020.They also confirm that the EEMD-ARIMA model is useful for the short-term forecasting of COVID-19,and that it is capable of keeping track of the real COVID-19 data in nearly all scenarios.The decomposition and ensemble strategy can be useful to help decision-makers in developing short-term strategies about the current number of disease occurrences until an appropriate vaccine is developed.
基金supported by the Sichuan Science and Technology Program(2018GZDZX0041)the National Key R&D Program of China(2018YFB1005100,2018YFB1005104)Specialized Fund for Science and Technology Platform and Talent Team Project of Guizhou Province(Qian Ke He Ping TaiRen Cai[2016]5609)。
文摘Dynamic geometry software, as a piece of computer-assisted instruction(CAI) software, is closely and deeply associated with mathematics, and is widely applied to mathematics teaching activities in primary and secondary schools. Meanwhile, web technology also has become an important technology for assisting education and teaching. This paper expounds a web-based dynamic geometry software development process, and analyses specific requirements regarding graphical application programming interface(API) required by dynamic geometry software. With experiments and comparison on the two different hypertext markup language(HTML)5 graphical API technologies, i.e., scalable vector graphics(SVG) and Canvas, on different apparatuses and browsers, we draw the conclusion that it is more suitable to adopt Canvas as the graphical API technology for the web-based dynamic geometry software, thus further proposed the principles and methods for an object-oriented Canvas design. The dynamic geometry software based on the newly-designed Canvas has technical advantages and educational value, well incorporating aesthetic education into mathematics education.
基金This work was supported by the National Natural Science Foundation of China(61972109,61632002)the Natural Science Foundation of Guangdong Province of China(2018A030313380)。
文摘Background:Coronavirus can cross the species barrier and infect humans with a severe respiratory syndrome.SARS-CoV-2 with potential origin of bat is still circulating in China.In this study,a prediction model is proposed to evaluate the infection risk of non-human-origin coronavirus for early warning.Methods:The spike protein sequences of 2666 coronaviruses were collected from 2019 Novel Coronavirus Resource(2019nCoVR)Database of China National Genomics Data Center on Jan 29,2020.A total of 507 human-origin viruses were regarded as positive samples,whereas 2159 non-human-origin viruses were regarded as negative.To capture the key information of the spike protein,three feature encoding algorithms(amino acid composition,AAC;parallel correlation-based pseudo-amino-acid composition,PC-PseAAC and G-gap dipeptide composition,GGAP)were used to train 41 random forest models.The optimal feature with the best performance was identified by the multidimensional scaling method,which was used to explore the pattern of human coronavirus.Results:The 10-fold cross-validation results showed that well performance was achieved with the use of the GGAP(g=3)feature.The predictive model achieved the maximum ACC of 98.18%coupled with the Matthews correlation coefficient(MCC)of 0.9638.Seven clusters for human coronaviruses(229E,NL63,OC43,HKU1,MERS-CoV,SARS-CoV,and SARS-CoV-2)were found.The cluster for SARS-CoV-2 was very close to that for SARS-CoV,which suggests that both of viruses have the same human receptor(angiotensin converting enzyme II).The big gap in the distance curve suggests that the origin of SARS-CoV-2 is not clear and further surveillance in the field should be made continuously.The smooth distance curve for SARS-CoV suggests that its close relatives still exist in nature and public health is challenged as usual.Conclusions:The optimal feature(GGAP,g=3)performed well in terms of predicting infection risk and could be used to explore the evolutionary dynamic in a simple,fast and large-scale manner.The study may be beneficial for the surveillance of the genome mutation of coronavirus in the field.
基金supported by grants from the National Key R&D Program of China(No.2021YFF1201003)the National Science Fund for Distinguished Young Scholars(No.81925023)+3 种基金the Key-Area Research and Development Program of Guangdong Province(No.2021B0101420006)the Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application(No.2022B1212010011)the High-level Hospital Construction Project(No.DFJHBF202105)the National Science Foundation for Young Scientists of China(No.82001986)
文摘Background:Artificial intelligence(AI)technology represented by deep learning has made remarkable achievements in digital pathology,enhancing the accuracy and reliability of diagnosis and prognosis evaluation.The spatial distribution of CD3^(+)and CD8^(+)T cells within the tumor microenvironment has been demonstrated to have a significant impact on the prognosis of colorectal cancer(CRC).This study aimed to investigate CD3_(CT)(CD3^(+)T cells density in the core of the tumor[CT])prognostic ability in patients with CRC by using AI technology.Methods:The study involved the enrollment of 492 patients from two distinct medical centers,with 358 patients assigned to the training cohort and an additional 134 patients allocated to the validation cohort.To facilitate tissue segmentation and T-cells quantification in whole-slide images(WSIs),a fully automated workflow based on deep learning was devised.Upon the completion of tissue segmentation and subsequent cell segmentation,a comprehensive analysis was conducted.Results:The evaluation of various positive T cell densities revealed comparable discriminatory ability between CD3_(CT) and CD3-CD8(the combination of CD3^(+)and CD8^(+)T cells density within the CT and invasive margin)in predicting mortality(C-index in training cohort:0.65 vs.0.64;validation cohort:0.69 vs.0.69).The CD3_(CT) was confirmed as an independent prognostic factor,with high CD3_(CT) density associated with increased overall survival(OS)in the training cohort(hazard ratio[HR]=0.22,95%confidence interval[CI]:0.12–0.38,P<0.001)and validation cohort(HR=0.21,95%CI:0.05–0.92,P=0.037).Conclusions:We quantify the spatial distribution of CD3^(+)and CD8^(+)T cells within tissue regions in WSIs using AI technology.The CD3_(CT) confirmed as a stage-independent predictor for OS in CRC patients.Moreover,CD3_(CT) shows promise in simplifying the CD3-CD8 system and facilitating its practical application in clinical settings.
基金supported by the National Natural Science Foundation of China(61972109,62172114,61632002).
文摘Background:Coronaviruses can be isolated from bats,civets,pangolins,birds and other wild animals.As an animalorigin pathogen,coronavirus can cross species barrier and cause pandemic in humans.In this study,a deep learning model for early prediction of pandemic risk was proposed based on the sequences of viral genomes.Methods:A total of 3257 genomes were downloaded from the Coronavirus Genome Resource Library.We present a deep learning model of cross-species coronavirus infection that combines a bidirectional gated recurrent unit network with a one-dimensional convolution.The genome sequence of animal-origin coronavirus was directly input to extract features and predict pandemic risk.The best performances were explored with the use of pre-trained DNA vector and attention mechanism.The area under the receiver operating characteristic curve(AUROC)and the area under precision-recall curve(AUPR)were used to evaluate the predictive models.Results:The six specifc models achieved good performances for the corresponding virus groups(1 for AUROC and 1 for AUPR).The general model with pre-training vector and attention mechanism provided excellent predictions for all virus groups(1 for AUROC and 1 for AUPR)while those without pre-training vector or attention mechanism had obvi‑ously reduction of performance(about 5–25%).Re-training experiments showed that the general model has good capabilities of transfer learning(average for six groups:0.968 for AUROC and 0.942 for AUPR)and should give reason‑able prediction for potential pathogen of next pandemic.The artifcial negative data with the replacement of the coding region of the spike protein were also predicted correctly(100%accuracy).With the application of the Python programming language,an easy-to-use tool was created to implements our predictor.Conclusions:Robust deep learning model with pre-training vector and attention mechanism mastered the features from the whole genomes of animal-origin coronaviruses and could predict the risk of cross-species infection for early warning of next pandemic.
基金supported by the National Major Research Instrument Development Project(62427811)the Key Program of the National Natural Science Foundation of China(62332006)the General Program of the National Natural Science Foundation of China(62172014).
文摘A specialized computer named as the Electronic Probe Computer(EPC)has been developed to address large-scale NP-complete problems.The EPC employs a hybrid serial/parallel computational model,structured around four main subsystems:a converting system,an input/output system,and an operating system.The converting system is a software component that transforms the target problem into the graph coloring problem,while the operating system is designed to solve these graph coloring challenges.Comprised of 60 probe computing cards,this system is referred to as EPC60.In tackling large-scale graph coloring problems with EPC60,1003-colorable graphs were randomly selected,each consisting of 2,000 vertices.The state-of-the-art mathematical optimization solver achieved a success rate of only 6%,while EPC60 excelled with a remarkable 100%success rate.Additionally,EPC60 successfully solved two 3-colorable graphs with 1,500 and 2,000 vertices,which had eluded Gurobi’s attempts for 15 days on a standard workstation.Given the mutual reducibility of NP-complete problems in polynomial time theoretically,the EPC stands out as a universal solver for NP-complete problem.The EPC can be applied to various problems that can be abstracted as combinatorial optimization issues,making it relevant across multiple domains,including supply chain management,financial services,telecommunications,energy systems,manufacturing,and beyond.
文摘Background:Influenza B virus can cause epidemics with high pathogenicity, so it poses a serious threat to public health. A feature representation algorithm is proposed in this paper to identify the pathogenicity phenotype of influenza B virus.Methods:The dataset included all 11 influenza virus proteins encoded in eight genome segments of 1724 strains. Two types of features were hierarchically used to build the prediction model. Amino acid features were directly delivered from 67 feature descriptors and input into the random forest classifier to output informative features about the class label and probabilistic prediction. The sequential forward search strategy was used to optimize the informative features. The final features for each strain had low dimensions and included knowledge from different perspectives, which were used to build the machine learning model for pathogenicity identification.Results:The 40 signature positions were achieved by entropy screening. Mutations at position 135 of the hemagglutinin protein had the highest entropy value (1.06). After the informative features were directly generated from the 67 random forest models, the dimensions for class and probabilistic features were optimized as 4 and 3, respectively. The optimal class features had a maximum accuracy of 94.2% and a maximum Matthews correlation coefficient of 88.4%, while the optimal probabilistic features had a maximum accuracy of 94.1% and a maximum Matthews correlation coefficient of 88.2%. The optimized features outperformed the original informative features and amino acid features from individual descriptors. The sequential forward search strategy had better performance than the classical ensemble method.Conclusions:The optimized informative features had the best performance and were used to build a predictive model so as to identify the phenotype of influenza B virus with high pathogenicity and provide early risk warning for disease control.
基金the Sichuan Science and Technology Program of China under Grant Nos.2018GZDZX0041 and 2020YFG0011the National Natural Science Foundation of China under Grant No.11701118,the Guangzhou Academician and Expert Workstation under Grant No.20200115-9Key Disciplines of Guizhou Province of China-Computer Science and Technology under Grant No.ZDXK[2018]007.
文摘A dynamic geometry system,as an important application in the field of geometric constraint solving,is widely used in elementary mathematics education;moreover,the dynamic geometry system is also a fundamental environment for automated theorem proving in geometry.In a geometric constraint solving process,a situation involving a critical point is often encountered,and geometric element degeneracy may occur at this point.Usually,the degeneracy situation must be substantively focused on during the learning and exploration process.However,many degeneracy situations cannot be completely presented even by the well-known dynamic geometry software.In this paper,the mechanisms causing the degeneracy of a geometric element are analyzed,and relevant definitions and formalized descriptions for the problem are provided according to the relevant modern Euclidean geometry theories.To solve the problem,the data structure is optimized,and a domain model design for the geometric element and the constraint relationships thereof in the dynamic geometry system are formed;furthermore,an update algorithm for the element is proposed based on the novel domain model.In addition,instances show that the proposed domain model and the update algorithm can effectively cope with the geometric element degeneracy situations in the geometric constraint solving process,thereby achieving unification of the dynamic geometry drawing and the geometric intuition of the user.