Ribonucleic acid(RNA)hybridization is widely used in popular RNA simulation software in bioinformatics.However limited by the exponential computational complexity of combin atorial problems,it is challenging to decide...Ribonucleic acid(RNA)hybridization is widely used in popular RNA simulation software in bioinformatics.However limited by the exponential computational complexity of combin atorial problems,it is challenging to decide,within an acceptable time,whether a specific RNA hybridization is effective.We hereby introduce a machine learning based technique to address this problem.Sample machine learning(ML)models tested in the training phase include algorithms based on the boosted tree(BT)random forest(RF),decision tree(DT)and logistic regression(LR),and the corresponding models are obtained.Given the RNA molecular coding training and testing sets,the trained machine learning models are applied to predict the classification of RNA hybridization results.The experiment results show that the op timal predictive accuracies are 96.2%,96.6%,96.0%and 69.8%for the RF,BT,DT and LR-based approaches,respectively,un der the strong constraint condition,compared with traditiona representative methods.Furthermore,the average computation efficiency of the RF,BT,DT and LR-based approaches are208679,269756,184333 and 187458 times higher than that o existing approach,respectively.Given an RNA design,the BT based approach demonstrates high computational efficiency and better predictive accuracy in determining the biological effective ness of molecular hybridization.展开更多
Purpose: In order to further the understanding of Chinese Web users' image-seeking behavior, this study explores the kinds of images that Chinese Web users seek online and how they express their requests.Design/meth...Purpose: In order to further the understanding of Chinese Web users' image-seeking behavior, this study explores the kinds of images that Chinese Web users seek online and how they express their requests.Design/methodology/approach: We used five pairs of simulated keywords to collect 893 image-seeking questions from Baidu Zhidao. Then, we revised the subject category of questions to analyze popular image needs. In addition, we conducted content analysis and descriptive statistical analysis to identify image-seeking motivations and image features used in the requests in terms of the two theories of image feature classification and image use. Findings: Among the 893 questions, the image searches for entertainment accounted for 47.59%, more than the searches for professional knowledge (37.40%) and personal daily activities (15.01%). With regard to motivation, over 60% of the questions were identified as used for learning, which is well over the proportion of questions used for illustrating. Thus, these questions requested images as sources of data rather than sources of objects. Non-visual features (47.58%) were used most frequently in question descriptions, slightly higher than semantic features (45.96%). Users who lacked domain knowledge tended to use general words rather than specific words to describe their requests. However, not many users used syntactic featm'es when seeking images. Nevertheless, most of the users had a fairly clear idea about what the target image should look like. Research limitations: We studied only one question and answer (Q&A) community using five pairs of simulated keywords. Practical implications: The findings should be helpful in strengthening the functionality of Q&A systems, promoting the theories of image feature classification, and shedding light on information literacy training.Originality/value: This study is one of the first research efforts that discusses Chinese Web users' daily image searches and querying behavior in natural language in a Q&A community, which should help to further the understanding of the principles of image-seeking behavior among Chinese Web users.展开更多
The development of software nowadays is getting more complex due to the need to use software programs to accomplish more elaborated tasks. Developers may have a hard time knowing what could happen to the software when...The development of software nowadays is getting more complex due to the need to use software programs to accomplish more elaborated tasks. Developers may have a hard time knowing what could happen to the software when making changes. To support the developer in reducing the uncertainty of the impact on the software run behavior due to changes in the source code, this paper presents a tool called IMPEX which analyzes the differences in the source code and differences on the run behavior of two subsequent software versions, in the entire repository, demonstrating to the developer the impact that a change in the source code has had on the software run, over the whole software history. This impact helps the developers in knowing what is affected during execution due to their changes in the source code. This study verifies that the software runs that are most impacted by a given change in the source code, have higher chances in being impacted in the future whenever this part of the code is changed again. The approach taken in this paper was able to precisely predict what would be impacted on the software execution when a change in the source code was made in 70% of the cases.展开更多
Purpose:This study aimed to examine the effect of radiation esophagitis(RE)and the dynamics of RE on subse-quent survival in non-small cell lung cancer(NSCLC)patients who underwent radiotherapy.Experimental Design:Pat...Purpose:This study aimed to examine the effect of radiation esophagitis(RE)and the dynamics of RE on subse-quent survival in non-small cell lung cancer(NSCLC)patients who underwent radiotherapy.Experimental Design:Patients with NSCLC treated with fractionated thoracic radiotherapy enrolled in prospective trials were eligible.RE was graded prospectively according to Common Terminology Criteria for Adverse Events(CTCAE)v3.0 per protocol requirement weekly during-RT and 1 month after RT.This study applied conditional survival assessment which has advantage over traditional survival analysis as it assesses the survival from the event instead of from the baseline.P-value less than 0.05 was considered to be significant.The primary endpoint is overall survival.Results:A total of 177 patients were eligible,with a median follow-up of 5 years.The presence of RE,the maximum RE grade,the evolution of RE and the onset timing of RE events were all correlated with subsequent survival.At all conditional time points,patients first presented with RE grade1(initial RE1)had significant inferior subsequent survival(multivariable HRs median:1.63,all P-values<0.05);meanwhile those with RE progressed had significant inferior subsequent survival than those never develop RE(multivariable HRs median:2.08,all P-values<0.05).Multivariable Cox proportional-hazards analysis showed significantly higher C-indexes for models with inclusion of RE events than those without(all P-values<0.05).Conclusion:This study comprehensively evaluated the impact of RE with conditional survival assessment and demonstrated that RE is associated with inferior survival in NSCLC patients treated with RT.展开更多
The majority of non-melanoma skin cancer(NMSC)is cutaneous basal cell carcinoma(BCC)or squamous cell carcinoma(SCC),which are also called keratinocyte carcinomas,as both of them originate from keratinocytes.The incide...The majority of non-melanoma skin cancer(NMSC)is cutaneous basal cell carcinoma(BCC)or squamous cell carcinoma(SCC),which are also called keratinocyte carcinomas,as both of them originate from keratinocytes.The incidence of keratinocyte carcinomas is over 5 million per year in the US,three-fold higher than the total incidence of all other types of cancer combined.While there are several reports on gene expression profiling of BCC and SCC,there are significant variations in the reported gene expression changes in different studies.One reason is that tumor-adjacent normal skin specimens were not included in many studies as matched controls.Furthermore,while numerous studies of skin stem cells in mouse models have been reported,their relevance to human skin cancer remains unknown.In this report,we analyzed gene expression profiles of paired specimens of keratinocyte carcinomas with their matched normal skin tissues as the control.Among several novel findings,we discovered a significant number of zinc finger encoding genes up-regulated in human BCC.In BCC,a novel link was found between hedgehog signaling,Wnt signaling,and the cilium.While the SCC cancerstem-cell gene signature is shared between human and mouse SCCs,the hair follicle stem-cell signature of mice was not highly represented in human SCC.Differential gene expression(DEG)in human BCC shares gene signature with both bulge and epidermal stem cells.We have also determined that human BCCs and SCCs have distinct gene expression patterns,and some of them are not fully reflected in current mouse models.展开更多
This literature review investigates the significant overlap between myelin-repair signaling pathways and pathways known to contribute to hallmark pathologies of Alzheimer’s disease(AD).We discuss previously investiga...This literature review investigates the significant overlap between myelin-repair signaling pathways and pathways known to contribute to hallmark pathologies of Alzheimer’s disease(AD).We discuss previously investigated therapeutic targets of amyloid,tau,and ApoE,as well as other potential therapeutic targets that have been empirically shown to contribute to both remyelination and progression of AD.Current evidence shows that there are multiple AD-relevant pathways which overlap significantly with remyelination and myelin repair through the encouragement of oligodendrocyte proliferation,maturation,and myelin production.There is a present need for a single,cohesive model of myelin homeostasis in AD.While determining a causative pathway is beyond the scope of this review,it may be possible to investigate the pathological overlap of myelin repair and AD through therapeutic approaches.展开更多
Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a dist...Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists.The aim is to partition the query payload(and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client's response time.However,since query size and distribution characteristics of data(data dense/sparse regions) in varying ranges are not known a priori,performing efficient load balancing and parallel processing over the unpredictable workload is difficult.A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed.The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.展开更多
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unkn...Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.展开更多
Alternative splicing of pre-mRNA transcripts is an important regulatory mechanism that increases the diversity of gene products in eukaryotes.Various studies have linked specific transcript isoforms to altered drug re...Alternative splicing of pre-mRNA transcripts is an important regulatory mechanism that increases the diversity of gene products in eukaryotes.Various studies have linked specific transcript isoforms to altered drug response in cancer;however,few algorithms have incorporated splicing information into drug response prediction.In this study,we evaluated whether basal-level splicing information could be used to predict drug sensitivity by constructing doxorubicin-sensitivity classification models with splicing and expression data.We detailed splicing differences between sensitive and resistant cell lines by implementing quasi-binomial generalized linear modeling(QBGLM)and found altered inclusion of 277 skipped exons.We additionally conducted RNA-binding protein(RBP)binding motif enrichment and differential ex-pression analysis to characterize cis-and trans-acting elements that potentially influence doxorubicin response-mediating splicing alterations.Our results showed that a classification model built with skipped exon data exhibited strong predictive power.We discovered an association between differentially spliced events and epithelial-mesenchymal transition(EMT)and observed motif enrichment,as well as differential expression of RBFOX and ELAVL RBP family members.Our work demonstrates the potential of incorporating splicing data into drug response algorithms and the utility of a QBGLM approach for fast,scalable identification of relevant splicing differences between large groups of samples.展开更多
We present version 1.2.0 of DASHMM,a general library implementing hierarchical multipole methods using the asynchronous multi-tasking HPX-5 runtime system.Compared with the previous release[10],this new version:(1)ena...We present version 1.2.0 of DASHMM,a general library implementing hierarchical multipole methods using the asynchronous multi-tasking HPX-5 runtime system.Compared with the previous release[10],this new version:(1)enables execution in both shared and distributed memory architectures;(2)extends DASHMM’s infrastructure to support advanced multipole methods[18];and(3)provides built-in implementations of both the Yukawa[15]potential and Helmholtz[16]potential in the low frequency regime.These additions have not impacted the user interface,which remains simple and extensible.展开更多
基金supported by the National Natural Science Foundation of China(U1204608,61472370,61672469,61822701)
文摘Ribonucleic acid(RNA)hybridization is widely used in popular RNA simulation software in bioinformatics.However limited by the exponential computational complexity of combin atorial problems,it is challenging to decide,within an acceptable time,whether a specific RNA hybridization is effective.We hereby introduce a machine learning based technique to address this problem.Sample machine learning(ML)models tested in the training phase include algorithms based on the boosted tree(BT)random forest(RF),decision tree(DT)and logistic regression(LR),and the corresponding models are obtained.Given the RNA molecular coding training and testing sets,the trained machine learning models are applied to predict the classification of RNA hybridization results.The experiment results show that the op timal predictive accuracies are 96.2%,96.6%,96.0%and 69.8%for the RF,BT,DT and LR-based approaches,respectively,un der the strong constraint condition,compared with traditiona representative methods.Furthermore,the average computation efficiency of the RF,BT,DT and LR-based approaches are208679,269756,184333 and 187458 times higher than that o existing approach,respectively.Given an RNA design,the BT based approach demonstrates high computational efficiency and better predictive accuracy in determining the biological effective ness of molecular hybridization.
基金supported by Humanities and Social Science Fund from the Chinese Ministry of Education (Grant No.: 11YJC870010)
文摘Purpose: In order to further the understanding of Chinese Web users' image-seeking behavior, this study explores the kinds of images that Chinese Web users seek online and how they express their requests.Design/methodology/approach: We used five pairs of simulated keywords to collect 893 image-seeking questions from Baidu Zhidao. Then, we revised the subject category of questions to analyze popular image needs. In addition, we conducted content analysis and descriptive statistical analysis to identify image-seeking motivations and image features used in the requests in terms of the two theories of image feature classification and image use. Findings: Among the 893 questions, the image searches for entertainment accounted for 47.59%, more than the searches for professional knowledge (37.40%) and personal daily activities (15.01%). With regard to motivation, over 60% of the questions were identified as used for learning, which is well over the proportion of questions used for illustrating. Thus, these questions requested images as sources of data rather than sources of objects. Non-visual features (47.58%) were used most frequently in question descriptions, slightly higher than semantic features (45.96%). Users who lacked domain knowledge tended to use general words rather than specific words to describe their requests. However, not many users used syntactic featm'es when seeking images. Nevertheless, most of the users had a fairly clear idea about what the target image should look like. Research limitations: We studied only one question and answer (Q&A) community using five pairs of simulated keywords. Practical implications: The findings should be helpful in strengthening the functionality of Q&A systems, promoting the theories of image feature classification, and shedding light on information literacy training.Originality/value: This study is one of the first research efforts that discusses Chinese Web users' daily image searches and querying behavior in natural language in a Q&A community, which should help to further the understanding of the principles of image-seeking behavior among Chinese Web users.
文摘The development of software nowadays is getting more complex due to the need to use software programs to accomplish more elaborated tasks. Developers may have a hard time knowing what could happen to the software when making changes. To support the developer in reducing the uncertainty of the impact on the software run behavior due to changes in the source code, this paper presents a tool called IMPEX which analyzes the differences in the source code and differences on the run behavior of two subsequent software versions, in the entire repository, demonstrating to the developer the impact that a change in the source code has had on the software run, over the whole software history. This impact helps the developers in knowing what is affected during execution due to their changes in the source code. This study verifies that the software runs that are most impacted by a given change in the source code, have higher chances in being impacted in the future whenever this part of the code is changed again. The approach taken in this paper was able to precisely predict what would be impacted on the software execution when a change in the source code was made in 70% of the cases.
基金supported by Shenzhen Fundamental Research Program(JCYJ2020109150427184)Shenzhen Science and Technology Program(KQTD20180411185028798)Shenzhen Fun-damental Research Program(JCYJ20180508153249223).
文摘Purpose:This study aimed to examine the effect of radiation esophagitis(RE)and the dynamics of RE on subse-quent survival in non-small cell lung cancer(NSCLC)patients who underwent radiotherapy.Experimental Design:Patients with NSCLC treated with fractionated thoracic radiotherapy enrolled in prospective trials were eligible.RE was graded prospectively according to Common Terminology Criteria for Adverse Events(CTCAE)v3.0 per protocol requirement weekly during-RT and 1 month after RT.This study applied conditional survival assessment which has advantage over traditional survival analysis as it assesses the survival from the event instead of from the baseline.P-value less than 0.05 was considered to be significant.The primary endpoint is overall survival.Results:A total of 177 patients were eligible,with a median follow-up of 5 years.The presence of RE,the maximum RE grade,the evolution of RE and the onset timing of RE events were all correlated with subsequent survival.At all conditional time points,patients first presented with RE grade1(initial RE1)had significant inferior subsequent survival(multivariable HRs median:1.63,all P-values<0.05);meanwhile those with RE progressed had significant inferior subsequent survival than those never develop RE(multivariable HRs median:2.08,all P-values<0.05).Multivariable Cox proportional-hazards analysis showed significantly higher C-indexes for models with inclusion of RE events than those without(all P-values<0.05).Conclusion:This study comprehensively evaluated the impact of RE with conditional survival assessment and demonstrated that RE is associated with inferior survival in NSCLC patients treated with RT.
基金This research is generously supported by Riley Children’s Foundation(J.X.)and AGA Foundation(J.X.).We acknowledge support from the IU Simon Cancer Center(grant number P30CA082709),the Purdue University Center for Cancer Research(grant number P30CA023168),and the Walther Cancer Foundation.
文摘The majority of non-melanoma skin cancer(NMSC)is cutaneous basal cell carcinoma(BCC)or squamous cell carcinoma(SCC),which are also called keratinocyte carcinomas,as both of them originate from keratinocytes.The incidence of keratinocyte carcinomas is over 5 million per year in the US,three-fold higher than the total incidence of all other types of cancer combined.While there are several reports on gene expression profiling of BCC and SCC,there are significant variations in the reported gene expression changes in different studies.One reason is that tumor-adjacent normal skin specimens were not included in many studies as matched controls.Furthermore,while numerous studies of skin stem cells in mouse models have been reported,their relevance to human skin cancer remains unknown.In this report,we analyzed gene expression profiles of paired specimens of keratinocyte carcinomas with their matched normal skin tissues as the control.Among several novel findings,we discovered a significant number of zinc finger encoding genes up-regulated in human BCC.In BCC,a novel link was found between hedgehog signaling,Wnt signaling,and the cilium.While the SCC cancerstem-cell gene signature is shared between human and mouse SCCs,the hair follicle stem-cell signature of mice was not highly represented in human SCC.Differential gene expression(DEG)in human BCC shares gene signature with both bulge and epidermal stem cells.We have also determined that human BCCs and SCCs have distinct gene expression patterns,and some of them are not fully reflected in current mouse models.
基金Ms.Hirschfeld received support from multiple grants during the preparation of this manuscript:T32AG071444 and F31AG074700Dr.Saykin receives support from multiple NIH grants(P30 AG010133,P30 AG072976,R01 AG019771,R01 AG057739,U19 AG024904,R01 LM013463,R01 AG068193,T32 AG071444,and U01 AG068057 and U01 AG072177)Dr.Risacher receives support from NIH grants AG061788 and K01AG049050.Dr.Nho receives support from NIH grants R01 LM012535 and R03 AG054936.
文摘This literature review investigates the significant overlap between myelin-repair signaling pathways and pathways known to contribute to hallmark pathologies of Alzheimer’s disease(AD).We discuss previously investigated therapeutic targets of amyloid,tau,and ApoE,as well as other potential therapeutic targets that have been empirically shown to contribute to both remyelination and progression of AD.Current evidence shows that there are multiple AD-relevant pathways which overlap significantly with remyelination and myelin repair through the encouragement of oligodendrocyte proliferation,maturation,and myelin production.There is a present need for a single,cohesive model of myelin homeostasis in AD.While determining a causative pathway is beyond the scope of this review,it may be possible to investigate the pathological overlap of myelin repair and AD through therapeutic approaches.
文摘Replication is an approach often used to speed up the execution of queries submitted to a large dataset.A compile-time/run-time approach is presented for minimizing the response time of 2-dimensional range when a distributed replica of a dataset exists.The aim is to partition the query payload(and its range) into subsets and distribute those to the replica nodes in a way that minimizes a client's response time.However,since query size and distribution characteristics of data(data dense/sparse regions) in varying ranges are not known a priori,performing efficient load balancing and parallel processing over the unpredictable workload is difficult.A technique based on the creation and manipulation of dynamic spatial indexes for query payload estimation in distributed queries was proposed.The effectiveness of this technique was demonstrated on queries for analysis of archived earthquake-generated seismic data records.
基金supported by NIH under Grant No. 1R01HG004908-01NSF of USA under Grant No. DBI-0845685 (YY)the Gordon and Betty Moore Foundation for the Community Cyberinfrastructure for Marine Microbial Ecological Research and Analysis (CAMERA) Project (JW)
文摘Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.
基金supported by the National Institutes of Health,USA(Grant No.R01CA213466)awarded to YL.the Precision Health Initiative at Indiana University.
文摘Alternative splicing of pre-mRNA transcripts is an important regulatory mechanism that increases the diversity of gene products in eukaryotes.Various studies have linked specific transcript isoforms to altered drug response in cancer;however,few algorithms have incorporated splicing information into drug response prediction.In this study,we evaluated whether basal-level splicing information could be used to predict drug sensitivity by constructing doxorubicin-sensitivity classification models with splicing and expression data.We detailed splicing differences between sensitive and resistant cell lines by implementing quasi-binomial generalized linear modeling(QBGLM)and found altered inclusion of 277 skipped exons.We additionally conducted RNA-binding protein(RBP)binding motif enrichment and differential ex-pression analysis to characterize cis-and trans-acting elements that potentially influence doxorubicin response-mediating splicing alterations.Our results showed that a classification model built with skipped exon data exhibited strong predictive power.We discovered an association between differentially spliced events and epithelial-mesenchymal transition(EMT)and observed motif enrichment,as well as differential expression of RBFOX and ELAVL RBP family members.Our work demonstrates the potential of incorporating splicing data into drug response algorithms and the utility of a QBGLM approach for fast,scalable identification of relevant splicing differences between large groups of samples.
基金This work was supported by National Science Foundation grant number ACI-1440396This research used resources of the National Energy Research Scientific Computing Center,a DOE Office of Science User Facility supported by the Office of Science of the U.S.Department of Energy under Contract No.DE-AC02-05CH11231.
文摘We present version 1.2.0 of DASHMM,a general library implementing hierarchical multipole methods using the asynchronous multi-tasking HPX-5 runtime system.Compared with the previous release[10],this new version:(1)enables execution in both shared and distributed memory architectures;(2)extends DASHMM’s infrastructure to support advanced multipole methods[18];and(3)provides built-in implementations of both the Yukawa[15]potential and Helmholtz[16]potential in the low frequency regime.These additions have not impacted the user interface,which remains simple and extensible.