The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability dist...The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability distributions as special cases.A particular case of a symmetrical probability distribution from this family is the Arcsine–Gaussian distribution.Key statistical properties of this distribution including quantile,mean residual life,order statistics and moments are derived.The Arcsine–Gaussian parameters are estimated using two classical estimation methods called moments and maximum likelihood methods.A simulation study which provides asymptotic distribution of all considered point estimators,90%and 95%asymptotic confidence intervals are performed to examine the estimation efficiency of the considered methods numerically.The simulation results show that both biases and variances of the estimators tend to zero as the sample size increases,i.e.,the estimators are asymptotically consistent.Also,when the sample size increases the coverage probabilities of the confidence intervals increase to the nominal levels,while the corresponding length decrease and approach zero.Two real data sets from the medicine filed are used to illustrate the flexibility of the Arcsine–Gaussian distribution as compared with the normal,logistic,and Cauchy models.The proposed distribution is very versatile to fit real applications and can be used as a good alternative to the traditional gaussian distribution.展开更多
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts...Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.展开更多
Objective: To evaluate the accuracy of identifying cancer patients by use of medical claims data in a health insurance system in China, and provide the basis for establishing the claims-based cancer surveillance syste...Objective: To evaluate the accuracy of identifying cancer patients by use of medical claims data in a health insurance system in China, and provide the basis for establishing the claims-based cancer surveillance system in China.Methods: We chose Hua County, Henan Province as the study site, and randomly selected 300 and 1,200 qualified inpatient electronic medical records(EMRs) as well as the New Rural Cooperative Medical Scheme(NCMS) claims records for cancer patients in Hua County People’s Hospital(HCPH) and Anyang Cancer Hospital(ACH) in 2017. Diagnostic information for NCMS claims was evaluated on an individual level, and sensitivity and positive predictive value(PPV) were calculated taking the EMRs as the gold standard.Results: The sensitivity of NCMS was 95.2%(93.8%-96.3%) and 92.0%(88.3%-94.8%) in ACH and HCPH,respectively. The PPV of the NCMS was 97.8%(96.7%-98.5%) in ACH and 89.0%(84.9%-92.3%) in HCPH.Overall, the weighted and combined sensitivity and PPV of NCMS in Hua County was 93.1% and 92.1%,respectively. Significantly higher sensitivity and PPV in identifying patients with common cancers than noncommon cancers were detected in HCPH and ACH separately(P<0.01).Conclusions: Identification of cancer patients by use of the NCMS is accurate on individual level, and it is therefore feasible to conduct claims-based cancer surveillance in areas not covered by cancer registries in China.展开更多
Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the genera...Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the general population using regular health examination data.Methods: The study population consist of more than 7,000 CRC cases and more than 140,000 controls. Using regular health examination data, a model detecting CRC cases was derived by the classification and regression trees(CART) algorithm. Receiver operating characteristic(ROC) curve was applied to evaluate the performance of models. The robustness and generalization of the CART model were validated by independent datasets. In addition, the effectiveness of CART-based screening was compared with stool-based screening.Results: After data quality control, 4,647 CRC cases and 133,898 controls free of colorectal neoplasms were used for downstream analysis. The final CART model based on four biomarkers(age, albumin, hematocrit and percent lymphocytes) was constructed. In the test set, the area under ROC curve(AUC) of the CART model was 0.88 [95%confidence interval(95% CI), 0.87-0.90] for detecting CRC. At the cutoff yielding 99.0% specificity, this model’s sensitivity was 62.2%(95% CI, 58.1%-66.2%), thereby achieving a 63-fold enrichment of CRC cases. We validated the robustness of the method across subsets of test set with diverse CRC incidences, aging rates, genders ratio, distributions of tumor stages and locations, and data sources. Importantly, CART-based screening had the higher positive predictive value(1.6%) than fecal immunochemical test(0.3%).Conclusions: As an alternative approach for the early detection of CRC, this study provides a low-cost method using regular health examination data to identify high-risk individuals for CRC for further examinations. The approach can promote early detection of CRC especially in developing countries such as China, where annual health examination is popular but regular CRC-specific screening is rare.展开更多
Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage techn...Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age, cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.展开更多
Prostate cancer (PCa) is one of the most common cancers among men globally. The authors aimed to evaluate the ability of the Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) to classify men with P...Prostate cancer (PCa) is one of the most common cancers among men globally. The authors aimed to evaluate the ability of the Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) to classify men with PCa, clinically significant PCa (CSPCa), or no PCa, especially among those with serum total prostate-specific antigen (tPSA) levels in the "gray zone" (4-10 ng ml-1). A total of 308 patients (355 lesions) were enrolled in this study. Diagnostic efficiency was determined. Univariate and multivariate analyses, receiver operating characteristic curve analysis, and decision curve analysis were performed to determine and compare the predictors of PCa and CSPCa. The results suggested that PI-RADS v2, tPSA, and prostate-specific antigen density (PSAD) were independent predictors of PCa and CSPCa. A PI-RADS v2 score L≥4 provided high negative predictive values (91.39% for PCa and 95.69% for CSPCa). A model of PI-RADS combined with PSA and PSAD helped to define a high-risk group (PI-RADS score = 5 and PSAD L≥0 15 ng ml-1 cm-3, with tPSA in the gray zone, or PI-RADS score L≥4 with high tPSA level) with a detection rate of 96.1% for PCa and 93.0% for CSPCa while a low-risk group with a detection rate of 6.1% for PCa and 2.2% for CSPCa. It was concluded that the PI-RADS v2 could be used as a reliable and independent predictor of PCa and CSPCa. The combination of PI-RADS v2 score with PSA and PSAD could be helpful in the prediction and diagnosis of PCa and CSPCa and, thus, may help in preventing unnecessary invasive procedures.展开更多
Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based...Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.展开更多
According to the 2024 global cancer data from GLOBOCAN,liver cancer ranks the 6th most common malignancy and the 3rd leading cause of cancer-related mortality worldwide[1].Among these cases,hepatocellular carcinoma(HC...According to the 2024 global cancer data from GLOBOCAN,liver cancer ranks the 6th most common malignancy and the 3rd leading cause of cancer-related mortality worldwide[1].Among these cases,hepatocellular carcinoma(HCC)accounts for approximately 85%−90%[2,3].Its incidence and mortality rates remain persistently high worldwide.However,China has the highest incidence and mortality rates of the disease in the world[4].And the majority of patients are diagnosed at intermediate or advanced stages.Thus,identifying novel tumor biomarkers for early detection and implementing precision therapy has long been a key focus of research.展开更多
Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid...Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid high-throughput genetic and genomic analysis have helped to identify a plethora of alterations that can potentially serve as new cancer biomarkers, and thus help to improve CRC diagnosis, prognosis, and treatment. Each distinct data type(copy number variations, gene and micro RNAs expression, Cp G island methylation) provides an investigator with a different, partially independent, and complementary view of the entire genome. However, elucidation of gene function will require more information than can be provided by analyzing a single type of data. The integration of knowledge obtained from different sources is becoming increasingly essential for obtaining an interdisciplinary view of large amounts of information, and also for cross-validating experimental results. The integration of numerous types of genetic and genomic data derived from public sources, and via the use of ad-hoc bioinformatics tools and statistical methods facilitates the discovery and validation of novel, informative biomarkers. This combinatory approach will also enable researchers to more accurately and comprehensively understand the associations between different biologic pathways, mechanisms, and phenomena, and gain new insights into the etiology of CRC.展开更多
AIM: To investigate the expression patterns of long non-coding RNAs (lncRNAs) in gastric cancer. METHODS: Two publicly available human exon arrays for gastric cancer and data for the corresponding normal tissue were d...AIM: To investigate the expression patterns of long non-coding RNAs (lncRNAs) in gastric cancer. METHODS: Two publicly available human exon arrays for gastric cancer and data for the corresponding normal tissue were downloaded from the Gene Expression Omnibus (GEO). We re-annotated the probes of the human exon arrays and retained the probes uniquely mapping to lncRNAs at the gene level. LncRNA expression profiles were generated by using robust multi-array average method in affymetrix power tools. The normalized data were then analyzed with a Bioconductor package linear models for microarray data and genes with adjusted P -values below 0.01 were considered differentially expressed. An independent data set was used to validate the results. RESULTS: With the computational pipeline established to re-annotate over 6.5 million probes of the Affymetrix Human Exon 1.0 ST array, we identified 136053 probes uniquely mapping to lncRNAs at the gene level. These probes correspond to 9294 lncRNAs, covering nearly 76% of the GENCODE lncRNA data set. By analyzing GSE27342 consisting of 80 paired gastric cancer and normal adjacent tissue samples, we identified 88 lncRNAs that were differentially expressed in gastric cancer, some of which have been reported to play a role in cancer, such as LINC00152, taurine upregulated 1, urothelial cancer associated 1, Pvt1 oncogene, small nucleolar RNA host gene 1 and LINC00261. In the validation data set GSE33335, 59% of these differentially expressed lncRNAs showed significant expression changes (adjusted P -value < 0.01) with the same direction. CONCLUSION: We identified a set of lncRNAs differentially expressed in gastric cancer, providing useful information for discovery of new biomarkers and therapeutic targets in gastric cancer.展开更多
Background: Owing to the use of tobacco and the consumption of alcohol and adulterated food, worldwide cancer incidence is increasing at an alarming and frightening rate. Since the last decade of the twentieth century...Background: Owing to the use of tobacco and the consumption of alcohol and adulterated food, worldwide cancer incidence is increasing at an alarming and frightening rate. Since the last decade of the twentieth century, lung cancer has been the most common cancer type. This study aimed to determine the global status of lung cancer and to evaluate the use of computational methods in the early detection of lung cancer.Methods: We used lung cancer data from the United Kingdom(UK), the United States(US), India, and Egypt. For statistical analysis, we used incidence and mortality as well as survival rates to better understand the critical state of lung cancer.Results: In the UK and the US, we found a significant decrease in lung cancer mortalities in the period of 1990–2014, whereas, in India and Egypt, such a decrease was not much promising. Additionally, we observed that, in the UK and the US, the survival rates of women with lung cancer were higher than those of men. We observed that the data mining and evolutionary algorithms were efficient in lung cancer detection.Conclusions: Our findings provide an inclusive understanding of the incidences, mortalities, and survival rates of lung cancer in the UK, the US, India, and Egypt. The combined use of data mining and evolutionary algorithm can be efficient in lung cancer detection.展开更多
文摘The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability distributions as special cases.A particular case of a symmetrical probability distribution from this family is the Arcsine–Gaussian distribution.Key statistical properties of this distribution including quantile,mean residual life,order statistics and moments are derived.The Arcsine–Gaussian parameters are estimated using two classical estimation methods called moments and maximum likelihood methods.A simulation study which provides asymptotic distribution of all considered point estimators,90%and 95%asymptotic confidence intervals are performed to examine the estimation efficiency of the considered methods numerically.The simulation results show that both biases and variances of the estimators tend to zero as the sample size increases,i.e.,the estimators are asymptotically consistent.Also,when the sample size increases the coverage probabilities of the confidence intervals increase to the nominal levels,while the corresponding length decrease and approach zero.Two real data sets from the medicine filed are used to illustrate the flexibility of the Arcsine–Gaussian distribution as compared with the normal,logistic,and Cauchy models.The proposed distribution is very versatile to fit real applications and can be used as a good alternative to the traditional gaussian distribution.
文摘Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.
基金supported by the National Natural Science Foundation of China (No. 30930102, 81473033)the National Key R&D Program of China (No. 2016YFC0901404)+2 种基金the Digestive Medical Coordinated Development Center of Beijing Hospitals Authority (No. XXZ0204)the Science Foundation of Peking University Cancer Hospital (No. 2017-4)the Open Project funded by the Key Laboratory of Carcinogenesis and Translational Research, Ministry of Education/Beijing (No. 2017-10)
文摘Objective: To evaluate the accuracy of identifying cancer patients by use of medical claims data in a health insurance system in China, and provide the basis for establishing the claims-based cancer surveillance system in China.Methods: We chose Hua County, Henan Province as the study site, and randomly selected 300 and 1,200 qualified inpatient electronic medical records(EMRs) as well as the New Rural Cooperative Medical Scheme(NCMS) claims records for cancer patients in Hua County People’s Hospital(HCPH) and Anyang Cancer Hospital(ACH) in 2017. Diagnostic information for NCMS claims was evaluated on an individual level, and sensitivity and positive predictive value(PPV) were calculated taking the EMRs as the gold standard.Results: The sensitivity of NCMS was 95.2%(93.8%-96.3%) and 92.0%(88.3%-94.8%) in ACH and HCPH,respectively. The PPV of the NCMS was 97.8%(96.7%-98.5%) in ACH and 89.0%(84.9%-92.3%) in HCPH.Overall, the weighted and combined sensitivity and PPV of NCMS in Hua County was 93.1% and 92.1%,respectively. Significantly higher sensitivity and PPV in identifying patients with common cancers than noncommon cancers were detected in HCPH and ACH separately(P<0.01).Conclusions: Identification of cancer patients by use of the NCMS is accurate on individual level, and it is therefore feasible to conduct claims-based cancer surveillance in areas not covered by cancer registries in China.
基金supported by funding from Beijing Municipal Science & Technology Commission, Clinical Application and Development of Capital Characteristic (No. Z161100000516003)National Natural Science Foundation of China (No. 31871266)
文摘Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the general population using regular health examination data.Methods: The study population consist of more than 7,000 CRC cases and more than 140,000 controls. Using regular health examination data, a model detecting CRC cases was derived by the classification and regression trees(CART) algorithm. Receiver operating characteristic(ROC) curve was applied to evaluate the performance of models. The robustness and generalization of the CART model were validated by independent datasets. In addition, the effectiveness of CART-based screening was compared with stool-based screening.Results: After data quality control, 4,647 CRC cases and 133,898 controls free of colorectal neoplasms were used for downstream analysis. The final CART model based on four biomarkers(age, albumin, hematocrit and percent lymphocytes) was constructed. In the test set, the area under ROC curve(AUC) of the CART model was 0.88 [95%confidence interval(95% CI), 0.87-0.90] for detecting CRC. At the cutoff yielding 99.0% specificity, this model’s sensitivity was 62.2%(95% CI, 58.1%-66.2%), thereby achieving a 63-fold enrichment of CRC cases. We validated the robustness of the method across subsets of test set with diverse CRC incidences, aging rates, genders ratio, distributions of tumor stages and locations, and data sources. Importantly, CART-based screening had the higher positive predictive value(1.6%) than fecal immunochemical test(0.3%).Conclusions: As an alternative approach for the early detection of CRC, this study provides a low-cost method using regular health examination data to identify high-risk individuals for CRC for further examinations. The approach can promote early detection of CRC especially in developing countries such as China, where annual health examination is popular but regular CRC-specific screening is rare.
文摘Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age, cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.
文摘Prostate cancer (PCa) is one of the most common cancers among men globally. The authors aimed to evaluate the ability of the Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) to classify men with PCa, clinically significant PCa (CSPCa), or no PCa, especially among those with serum total prostate-specific antigen (tPSA) levels in the "gray zone" (4-10 ng ml-1). A total of 308 patients (355 lesions) were enrolled in this study. Diagnostic efficiency was determined. Univariate and multivariate analyses, receiver operating characteristic curve analysis, and decision curve analysis were performed to determine and compare the predictors of PCa and CSPCa. The results suggested that PI-RADS v2, tPSA, and prostate-specific antigen density (PSAD) were independent predictors of PCa and CSPCa. A PI-RADS v2 score L≥4 provided high negative predictive values (91.39% for PCa and 95.69% for CSPCa). A model of PI-RADS combined with PSA and PSAD helped to define a high-risk group (PI-RADS score = 5 and PSAD L≥0 15 ng ml-1 cm-3, with tPSA in the gray zone, or PI-RADS score L≥4 with high tPSA level) with a detection rate of 96.1% for PCa and 93.0% for CSPCa while a low-risk group with a detection rate of 6.1% for PCa and 2.2% for CSPCa. It was concluded that the PI-RADS v2 could be used as a reliable and independent predictor of PCa and CSPCa. The combination of PI-RADS v2 score with PSA and PSAD could be helpful in the prediction and diagnosis of PCa and CSPCa and, thus, may help in preventing unnecessary invasive procedures.
文摘Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.
基金supported by a grant from the Central Level Public Welfare Research Institutes Basic Research Expenses of Chinese Academy of Medical Sciences(No.2023-RW320-05)。
文摘According to the 2024 global cancer data from GLOBOCAN,liver cancer ranks the 6th most common malignancy and the 3rd leading cause of cancer-related mortality worldwide[1].Among these cases,hepatocellular carcinoma(HCC)accounts for approximately 85%−90%[2,3].Its incidence and mortality rates remain persistently high worldwide.However,China has the highest incidence and mortality rates of the disease in the world[4].And the majority of patients are diagnosed at intermediate or advanced stages.Thus,identifying novel tumor biomarkers for early detection and implementing precision therapy has long been a key focus of research.
基金Supported by Associazione Italiana per la Ricerca sul CancroGrants No.10529 and No.12162funds obtained throughan Italian law that allows taxpayers to allocate 0.5%share of theirincome tax contribution to a research institution of their choice
文摘Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid high-throughput genetic and genomic analysis have helped to identify a plethora of alterations that can potentially serve as new cancer biomarkers, and thus help to improve CRC diagnosis, prognosis, and treatment. Each distinct data type(copy number variations, gene and micro RNAs expression, Cp G island methylation) provides an investigator with a different, partially independent, and complementary view of the entire genome. However, elucidation of gene function will require more information than can be provided by analyzing a single type of data. The integration of knowledge obtained from different sources is becoming increasingly essential for obtaining an interdisciplinary view of large amounts of information, and also for cross-validating experimental results. The integration of numerous types of genetic and genomic data derived from public sources, and via the use of ad-hoc bioinformatics tools and statistical methods facilitates the discovery and validation of novel, informative biomarkers. This combinatory approach will also enable researchers to more accurately and comprehensively understand the associations between different biologic pathways, mechanisms, and phenomena, and gain new insights into the etiology of CRC.
文摘AIM: To investigate the expression patterns of long non-coding RNAs (lncRNAs) in gastric cancer. METHODS: Two publicly available human exon arrays for gastric cancer and data for the corresponding normal tissue were downloaded from the Gene Expression Omnibus (GEO). We re-annotated the probes of the human exon arrays and retained the probes uniquely mapping to lncRNAs at the gene level. LncRNA expression profiles were generated by using robust multi-array average method in affymetrix power tools. The normalized data were then analyzed with a Bioconductor package linear models for microarray data and genes with adjusted P -values below 0.01 were considered differentially expressed. An independent data set was used to validate the results. RESULTS: With the computational pipeline established to re-annotate over 6.5 million probes of the Affymetrix Human Exon 1.0 ST array, we identified 136053 probes uniquely mapping to lncRNAs at the gene level. These probes correspond to 9294 lncRNAs, covering nearly 76% of the GENCODE lncRNA data set. By analyzing GSE27342 consisting of 80 paired gastric cancer and normal adjacent tissue samples, we identified 88 lncRNAs that were differentially expressed in gastric cancer, some of which have been reported to play a role in cancer, such as LINC00152, taurine upregulated 1, urothelial cancer associated 1, Pvt1 oncogene, small nucleolar RNA host gene 1 and LINC00261. In the validation data set GSE33335, 59% of these differentially expressed lncRNAs showed significant expression changes (adjusted P -value < 0.01) with the same direction. CONCLUSION: We identified a set of lncRNAs differentially expressed in gastric cancer, providing useful information for discovery of new biomarkers and therapeutic targets in gastric cancer.
文摘Background: Owing to the use of tobacco and the consumption of alcohol and adulterated food, worldwide cancer incidence is increasing at an alarming and frightening rate. Since the last decade of the twentieth century, lung cancer has been the most common cancer type. This study aimed to determine the global status of lung cancer and to evaluate the use of computational methods in the early detection of lung cancer.Methods: We used lung cancer data from the United Kingdom(UK), the United States(US), India, and Egypt. For statistical analysis, we used incidence and mortality as well as survival rates to better understand the critical state of lung cancer.Results: In the UK and the US, we found a significant decrease in lung cancer mortalities in the period of 1990–2014, whereas, in India and Egypt, such a decrease was not much promising. Additionally, we observed that, in the UK and the US, the survival rates of women with lung cancer were higher than those of men. We observed that the data mining and evolutionary algorithms were efficient in lung cancer detection.Conclusions: Our findings provide an inclusive understanding of the incidences, mortalities, and survival rates of lung cancer in the UK, the US, India, and Egypt. The combined use of data mining and evolutionary algorithm can be efficient in lung cancer detection.