Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based...Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.展开更多
The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability dist...The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability distributions as special cases.A particular case of a symmetrical probability distribution from this family is the Arcsine–Gaussian distribution.Key statistical properties of this distribution including quantile,mean residual life,order statistics and moments are derived.The Arcsine–Gaussian parameters are estimated using two classical estimation methods called moments and maximum likelihood methods.A simulation study which provides asymptotic distribution of all considered point estimators,90%and 95%asymptotic confidence intervals are performed to examine the estimation efficiency of the considered methods numerically.The simulation results show that both biases and variances of the estimators tend to zero as the sample size increases,i.e.,the estimators are asymptotically consistent.Also,when the sample size increases the coverage probabilities of the confidence intervals increase to the nominal levels,while the corresponding length decrease and approach zero.Two real data sets from the medicine filed are used to illustrate the flexibility of the Arcsine–Gaussian distribution as compared with the normal,logistic,and Cauchy models.The proposed distribution is very versatile to fit real applications and can be used as a good alternative to the traditional gaussian distribution.展开更多
Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts...Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.展开更多
Objective: To evaluate the accuracy of identifying cancer patients by use of medical claims data in a health insurance system in China, and provide the basis for establishing the claims-based cancer surveillance syste...Objective: To evaluate the accuracy of identifying cancer patients by use of medical claims data in a health insurance system in China, and provide the basis for establishing the claims-based cancer surveillance system in China.Methods: We chose Hua County, Henan Province as the study site, and randomly selected 300 and 1,200 qualified inpatient electronic medical records(EMRs) as well as the New Rural Cooperative Medical Scheme(NCMS) claims records for cancer patients in Hua County People’s Hospital(HCPH) and Anyang Cancer Hospital(ACH) in 2017. Diagnostic information for NCMS claims was evaluated on an individual level, and sensitivity and positive predictive value(PPV) were calculated taking the EMRs as the gold standard.Results: The sensitivity of NCMS was 95.2%(93.8%-96.3%) and 92.0%(88.3%-94.8%) in ACH and HCPH,respectively. The PPV of the NCMS was 97.8%(96.7%-98.5%) in ACH and 89.0%(84.9%-92.3%) in HCPH.Overall, the weighted and combined sensitivity and PPV of NCMS in Hua County was 93.1% and 92.1%,respectively. Significantly higher sensitivity and PPV in identifying patients with common cancers than noncommon cancers were detected in HCPH and ACH separately(P<0.01).Conclusions: Identification of cancer patients by use of the NCMS is accurate on individual level, and it is therefore feasible to conduct claims-based cancer surveillance in areas not covered by cancer registries in China.展开更多
Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the genera...Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the general population using regular health examination data.Methods: The study population consist of more than 7,000 CRC cases and more than 140,000 controls. Using regular health examination data, a model detecting CRC cases was derived by the classification and regression trees(CART) algorithm. Receiver operating characteristic(ROC) curve was applied to evaluate the performance of models. The robustness and generalization of the CART model were validated by independent datasets. In addition, the effectiveness of CART-based screening was compared with stool-based screening.Results: After data quality control, 4,647 CRC cases and 133,898 controls free of colorectal neoplasms were used for downstream analysis. The final CART model based on four biomarkers(age, albumin, hematocrit and percent lymphocytes) was constructed. In the test set, the area under ROC curve(AUC) of the CART model was 0.88 [95%confidence interval(95% CI), 0.87-0.90] for detecting CRC. At the cutoff yielding 99.0% specificity, this model’s sensitivity was 62.2%(95% CI, 58.1%-66.2%), thereby achieving a 63-fold enrichment of CRC cases. We validated the robustness of the method across subsets of test set with diverse CRC incidences, aging rates, genders ratio, distributions of tumor stages and locations, and data sources. Importantly, CART-based screening had the higher positive predictive value(1.6%) than fecal immunochemical test(0.3%).Conclusions: As an alternative approach for the early detection of CRC, this study provides a low-cost method using regular health examination data to identify high-risk individuals for CRC for further examinations. The approach can promote early detection of CRC especially in developing countries such as China, where annual health examination is popular but regular CRC-specific screening is rare.展开更多
Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage techn...Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age, cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.展开更多
Prostate cancer (PCa) is one of the most common cancers among men globally. The authors aimed to evaluate the ability of the Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) to classify men with P...Prostate cancer (PCa) is one of the most common cancers among men globally. The authors aimed to evaluate the ability of the Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) to classify men with PCa, clinically significant PCa (CSPCa), or no PCa, especially among those with serum total prostate-specific antigen (tPSA) levels in the "gray zone" (4-10 ng ml-1). A total of 308 patients (355 lesions) were enrolled in this study. Diagnostic efficiency was determined. Univariate and multivariate analyses, receiver operating characteristic curve analysis, and decision curve analysis were performed to determine and compare the predictors of PCa and CSPCa. The results suggested that PI-RADS v2, tPSA, and prostate-specific antigen density (PSAD) were independent predictors of PCa and CSPCa. A PI-RADS v2 score L≥4 provided high negative predictive values (91.39% for PCa and 95.69% for CSPCa). A model of PI-RADS combined with PSA and PSAD helped to define a high-risk group (PI-RADS score = 5 and PSAD L≥0 15 ng ml-1 cm-3, with tPSA in the gray zone, or PI-RADS score L≥4 with high tPSA level) with a detection rate of 96.1% for PCa and 93.0% for CSPCa while a low-risk group with a detection rate of 6.1% for PCa and 2.2% for CSPCa. It was concluded that the PI-RADS v2 could be used as a reliable and independent predictor of PCa and CSPCa. The combination of PI-RADS v2 score with PSA and PSAD could be helpful in the prediction and diagnosis of PCa and CSPCa and, thus, may help in preventing unnecessary invasive procedures.展开更多
Context and Objectives: Stomach cancer ranks fifth in incidence and fourth in mortality worldwide. In Senegal, there were 597 new cases in 2020, with a mortality rate of almost 70%. The aim of this study was to develo...Context and Objectives: Stomach cancer ranks fifth in incidence and fourth in mortality worldwide. In Senegal, there were 597 new cases in 2020, with a mortality rate of almost 70%. The aim of this study was to develop a machine-learning model for the prognosis of death from stomach cancer 5 years after treatment. Methods: Our study sample consisted of 262 patients treated for gastric cancer at Aristide le Dantec Hospital and followed postoperatively between 2007 and 2020. We developed a multilayer perceptron with optimal hyperparameters and compared its performance with standard classification algorithms. We also augmented our data with a set of synthetic data generators to evaluate the behaviour of the model when faced with a larger amount of data. Results: Our model obtained an accuracy of 97.5%, outperforming the SVM (93%), RF (93.8%) and KNN (92.7%) models. An improvement of 1.5% in accuracy was achieved with synthetic data. Our study showed that the most pejorative factors in the evolution of the cancer were the appearance of hepatic metastases or adenopathy, smoking, and the infiltrative and stenosing aspects of the tumour on endoscopy. Conclusion: Our model predicted the occurrence of death from gastric cancer with very high accuracy, outperforming standard classification algorithms. The increase in training data produced an improvement in accuracy. Our study will help doctors to personalize the management of gastric cancer patients.展开更多
上海市作为中国最早开展全人群肿瘤登记的城市,自1963年建立肿瘤登记系统以来,通过持续的制度完善、技术创新和标准化管理,肿瘤登记数据自1982年起连续9次被《五大洲癌症发病率》(Cancer Incidence in Five Continents,CI5)收录,成为我...上海市作为中国最早开展全人群肿瘤登记的城市,自1963年建立肿瘤登记系统以来,通过持续的制度完善、技术创新和标准化管理,肿瘤登记数据自1982年起连续9次被《五大洲癌症发病率》(Cancer Incidence in Five Continents,CI5)收录,成为我国内地首个登记数据质量获国际权威认可的肿瘤登记处。全文系统回顾了上海市肿瘤登记工作的发展历程,重点分析了其在数据收集、编码标准化、质量控制、信息化管理和数据分析利用等方面如何满足国际癌症研究署的严苛要求,并基于《健康上海行动——癌症防治行动实施方案(2023—2030年)》等政策文件,提出未来肿瘤登记工作应进一步强化数据共享、多维度综合监测及人工智能应用等发展方向。结合国际标准与本土实践,以期为我国肿瘤登记体系的优化提供参考。展开更多
A research study collected intensive longitudinal data from cancer patients on a daily basis as well as non-intensive longitudinal survey data on a monthly basis. Although the daily data need separate analysis, those ...A research study collected intensive longitudinal data from cancer patients on a daily basis as well as non-intensive longitudinal survey data on a monthly basis. Although the daily data need separate analysis, those data can also be utilized to generate predictors of monthly outcomes. Alternatives for generating daily data predictors of monthly outcomes are addressed in this work. Analyses are reported of depression measured by the Patient Health Questionnaire 8 as the monthly survey outcome. Daily measures include numbers of opioid medications taken, numbers of pain flares, least pain levels, and worst pain levels. Predictors are averages of recent non-missing values for each daily measure recorded on or prior to survey dates for depression values. Weights for recent non-missing values are based on days between measurement of a recent value and a survey date. Five alternative averages are considered: averages with unit weights, averages with reciprocal weights, weighted averages with reciprocal weights, averages with exponential weights, and weighted averages with exponential weights. Adaptive regression methods based on likelihood cross-validation (LCV) scores are used to generate fractional polynomial models for possible nonlinear dependence of depression on each average. For all four daily measures, the best LCV score over averages of all types is generated using the average of recent non-missing values with reciprocal weights. Generated models are nonlinear and monotonic. Results indicate that an appropriate choice would be to assume three recent non-missing values and use the average with reciprocal weights of the first three recent non-missing values.展开更多
Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid...Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid high-throughput genetic and genomic analysis have helped to identify a plethora of alterations that can potentially serve as new cancer biomarkers, and thus help to improve CRC diagnosis, prognosis, and treatment. Each distinct data type(copy number variations, gene and micro RNAs expression, Cp G island methylation) provides an investigator with a different, partially independent, and complementary view of the entire genome. However, elucidation of gene function will require more information than can be provided by analyzing a single type of data. The integration of knowledge obtained from different sources is becoming increasingly essential for obtaining an interdisciplinary view of large amounts of information, and also for cross-validating experimental results. The integration of numerous types of genetic and genomic data derived from public sources, and via the use of ad-hoc bioinformatics tools and statistical methods facilitates the discovery and validation of novel, informative biomarkers. This combinatory approach will also enable researchers to more accurately and comprehensively understand the associations between different biologic pathways, mechanisms, and phenomena, and gain new insights into the etiology of CRC.展开更多
文摘Lung cancer remains a significant global health challenge and identifying lung cancer at an early stage is essential for enhancing patient outcomes. The study focuses on developing and optimizing gene expression-based models for classifying cancer types using machine learning techniques. By applying Log2 normalization to gene expression data and conducting Wilcoxon rank sum tests, the researchers employed various classifiers and Incremental Feature Selection (IFS) strategies. The study culminated in two optimized models using the XGBoost classifier, comprising 10 and 74 genes respectively. The 10-gene model, due to its simplicity, is proposed for easier clinical implementation, whereas the 74-gene model exhibited superior performance in terms of Specificity, AUC (Area Under the Curve), and Precision. These models were evaluated based on their sensitivity, AUC, and specificity, aiming to achieve high sensitivity and AUC while maintaining reasonable specificity.
文摘The main objective of this paper is to discuss a general family of distributions generated from the symmetrical arcsine distribution.The considered family includes various asymmetrical and symmetrical probability distributions as special cases.A particular case of a symmetrical probability distribution from this family is the Arcsine–Gaussian distribution.Key statistical properties of this distribution including quantile,mean residual life,order statistics and moments are derived.The Arcsine–Gaussian parameters are estimated using two classical estimation methods called moments and maximum likelihood methods.A simulation study which provides asymptotic distribution of all considered point estimators,90%and 95%asymptotic confidence intervals are performed to examine the estimation efficiency of the considered methods numerically.The simulation results show that both biases and variances of the estimators tend to zero as the sample size increases,i.e.,the estimators are asymptotically consistent.Also,when the sample size increases the coverage probabilities of the confidence intervals increase to the nominal levels,while the corresponding length decrease and approach zero.Two real data sets from the medicine filed are used to illustrate the flexibility of the Arcsine–Gaussian distribution as compared with the normal,logistic,and Cauchy models.The proposed distribution is very versatile to fit real applications and can be used as a good alternative to the traditional gaussian distribution.
文摘Background: Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the '~ 〉〉 N" problem and is also ubiquitous in other bioinformatics and computational biology fields. The "p 〉〉 N" problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue. Methods: We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently. Results: We obtain several theoretical results on the rationality and properties of our model. We further evaluate the power of our model based on a challenging task -- predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods. Conclusions: The proposed eRBMs are capable of dealing with the "p 〉〉 N" problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.
基金supported by the National Natural Science Foundation of China (No. 30930102, 81473033)the National Key R&D Program of China (No. 2016YFC0901404)+2 种基金the Digestive Medical Coordinated Development Center of Beijing Hospitals Authority (No. XXZ0204)the Science Foundation of Peking University Cancer Hospital (No. 2017-4)the Open Project funded by the Key Laboratory of Carcinogenesis and Translational Research, Ministry of Education/Beijing (No. 2017-10)
文摘Objective: To evaluate the accuracy of identifying cancer patients by use of medical claims data in a health insurance system in China, and provide the basis for establishing the claims-based cancer surveillance system in China.Methods: We chose Hua County, Henan Province as the study site, and randomly selected 300 and 1,200 qualified inpatient electronic medical records(EMRs) as well as the New Rural Cooperative Medical Scheme(NCMS) claims records for cancer patients in Hua County People’s Hospital(HCPH) and Anyang Cancer Hospital(ACH) in 2017. Diagnostic information for NCMS claims was evaluated on an individual level, and sensitivity and positive predictive value(PPV) were calculated taking the EMRs as the gold standard.Results: The sensitivity of NCMS was 95.2%(93.8%-96.3%) and 92.0%(88.3%-94.8%) in ACH and HCPH,respectively. The PPV of the NCMS was 97.8%(96.7%-98.5%) in ACH and 89.0%(84.9%-92.3%) in HCPH.Overall, the weighted and combined sensitivity and PPV of NCMS in Hua County was 93.1% and 92.1%,respectively. Significantly higher sensitivity and PPV in identifying patients with common cancers than noncommon cancers were detected in HCPH and ACH separately(P<0.01).Conclusions: Identification of cancer patients by use of the NCMS is accurate on individual level, and it is therefore feasible to conduct claims-based cancer surveillance in areas not covered by cancer registries in China.
基金supported by funding from Beijing Municipal Science & Technology Commission, Clinical Application and Development of Capital Characteristic (No. Z161100000516003)National Natural Science Foundation of China (No. 31871266)
文摘Objective: Challenges remain in current practices of colorectal cancer(CRC) screening, such as low compliance,low specificities and expensive cost. This study aimed to identify high-risk groups for CRC from the general population using regular health examination data.Methods: The study population consist of more than 7,000 CRC cases and more than 140,000 controls. Using regular health examination data, a model detecting CRC cases was derived by the classification and regression trees(CART) algorithm. Receiver operating characteristic(ROC) curve was applied to evaluate the performance of models. The robustness and generalization of the CART model were validated by independent datasets. In addition, the effectiveness of CART-based screening was compared with stool-based screening.Results: After data quality control, 4,647 CRC cases and 133,898 controls free of colorectal neoplasms were used for downstream analysis. The final CART model based on four biomarkers(age, albumin, hematocrit and percent lymphocytes) was constructed. In the test set, the area under ROC curve(AUC) of the CART model was 0.88 [95%confidence interval(95% CI), 0.87-0.90] for detecting CRC. At the cutoff yielding 99.0% specificity, this model’s sensitivity was 62.2%(95% CI, 58.1%-66.2%), thereby achieving a 63-fold enrichment of CRC cases. We validated the robustness of the method across subsets of test set with diverse CRC incidences, aging rates, genders ratio, distributions of tumor stages and locations, and data sources. Importantly, CART-based screening had the higher positive predictive value(1.6%) than fecal immunochemical test(0.3%).Conclusions: As an alternative approach for the early detection of CRC, this study provides a low-cost method using regular health examination data to identify high-risk individuals for CRC for further examinations. The approach can promote early detection of CRC especially in developing countries such as China, where annual health examination is popular but regular CRC-specific screening is rare.
文摘Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age, cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.
文摘Prostate cancer (PCa) is one of the most common cancers among men globally. The authors aimed to evaluate the ability of the Prostate Imaging Reporting and Data System version 2 (PI-RADS v2) to classify men with PCa, clinically significant PCa (CSPCa), or no PCa, especially among those with serum total prostate-specific antigen (tPSA) levels in the "gray zone" (4-10 ng ml-1). A total of 308 patients (355 lesions) were enrolled in this study. Diagnostic efficiency was determined. Univariate and multivariate analyses, receiver operating characteristic curve analysis, and decision curve analysis were performed to determine and compare the predictors of PCa and CSPCa. The results suggested that PI-RADS v2, tPSA, and prostate-specific antigen density (PSAD) were independent predictors of PCa and CSPCa. A PI-RADS v2 score L≥4 provided high negative predictive values (91.39% for PCa and 95.69% for CSPCa). A model of PI-RADS combined with PSA and PSAD helped to define a high-risk group (PI-RADS score = 5 and PSAD L≥0 15 ng ml-1 cm-3, with tPSA in the gray zone, or PI-RADS score L≥4 with high tPSA level) with a detection rate of 96.1% for PCa and 93.0% for CSPCa while a low-risk group with a detection rate of 6.1% for PCa and 2.2% for CSPCa. It was concluded that the PI-RADS v2 could be used as a reliable and independent predictor of PCa and CSPCa. The combination of PI-RADS v2 score with PSA and PSAD could be helpful in the prediction and diagnosis of PCa and CSPCa and, thus, may help in preventing unnecessary invasive procedures.
文摘Context and Objectives: Stomach cancer ranks fifth in incidence and fourth in mortality worldwide. In Senegal, there were 597 new cases in 2020, with a mortality rate of almost 70%. The aim of this study was to develop a machine-learning model for the prognosis of death from stomach cancer 5 years after treatment. Methods: Our study sample consisted of 262 patients treated for gastric cancer at Aristide le Dantec Hospital and followed postoperatively between 2007 and 2020. We developed a multilayer perceptron with optimal hyperparameters and compared its performance with standard classification algorithms. We also augmented our data with a set of synthetic data generators to evaluate the behaviour of the model when faced with a larger amount of data. Results: Our model obtained an accuracy of 97.5%, outperforming the SVM (93%), RF (93.8%) and KNN (92.7%) models. An improvement of 1.5% in accuracy was achieved with synthetic data. Our study showed that the most pejorative factors in the evolution of the cancer were the appearance of hepatic metastases or adenopathy, smoking, and the infiltrative and stenosing aspects of the tumour on endoscopy. Conclusion: Our model predicted the occurrence of death from gastric cancer with very high accuracy, outperforming standard classification algorithms. The increase in training data produced an improvement in accuracy. Our study will help doctors to personalize the management of gastric cancer patients.
文摘上海市作为中国最早开展全人群肿瘤登记的城市,自1963年建立肿瘤登记系统以来,通过持续的制度完善、技术创新和标准化管理,肿瘤登记数据自1982年起连续9次被《五大洲癌症发病率》(Cancer Incidence in Five Continents,CI5)收录,成为我国内地首个登记数据质量获国际权威认可的肿瘤登记处。全文系统回顾了上海市肿瘤登记工作的发展历程,重点分析了其在数据收集、编码标准化、质量控制、信息化管理和数据分析利用等方面如何满足国际癌症研究署的严苛要求,并基于《健康上海行动——癌症防治行动实施方案(2023—2030年)》等政策文件,提出未来肿瘤登记工作应进一步强化数据共享、多维度综合监测及人工智能应用等发展方向。结合国际标准与本土实践,以期为我国肿瘤登记体系的优化提供参考。
文摘A research study collected intensive longitudinal data from cancer patients on a daily basis as well as non-intensive longitudinal survey data on a monthly basis. Although the daily data need separate analysis, those data can also be utilized to generate predictors of monthly outcomes. Alternatives for generating daily data predictors of monthly outcomes are addressed in this work. Analyses are reported of depression measured by the Patient Health Questionnaire 8 as the monthly survey outcome. Daily measures include numbers of opioid medications taken, numbers of pain flares, least pain levels, and worst pain levels. Predictors are averages of recent non-missing values for each daily measure recorded on or prior to survey dates for depression values. Weights for recent non-missing values are based on days between measurement of a recent value and a survey date. Five alternative averages are considered: averages with unit weights, averages with reciprocal weights, weighted averages with reciprocal weights, averages with exponential weights, and weighted averages with exponential weights. Adaptive regression methods based on likelihood cross-validation (LCV) scores are used to generate fractional polynomial models for possible nonlinear dependence of depression on each average. For all four daily measures, the best LCV score over averages of all types is generated using the average of recent non-missing values with reciprocal weights. Generated models are nonlinear and monotonic. Results indicate that an appropriate choice would be to assume three recent non-missing values and use the average with reciprocal weights of the first three recent non-missing values.
基金Supported by Associazione Italiana per la Ricerca sul CancroGrants No.10529 and No.12162funds obtained throughan Italian law that allows taxpayers to allocate 0.5%share of theirincome tax contribution to a research institution of their choice
文摘Colorectal cancers(CRCs) display a wide variety of genomic aberrations that may be either causally linked to their development and progression, or might serve as biomarkers for their presence. Recent advances in rapid high-throughput genetic and genomic analysis have helped to identify a plethora of alterations that can potentially serve as new cancer biomarkers, and thus help to improve CRC diagnosis, prognosis, and treatment. Each distinct data type(copy number variations, gene and micro RNAs expression, Cp G island methylation) provides an investigator with a different, partially independent, and complementary view of the entire genome. However, elucidation of gene function will require more information than can be provided by analyzing a single type of data. The integration of knowledge obtained from different sources is becoming increasingly essential for obtaining an interdisciplinary view of large amounts of information, and also for cross-validating experimental results. The integration of numerous types of genetic and genomic data derived from public sources, and via the use of ad-hoc bioinformatics tools and statistical methods facilitates the discovery and validation of novel, informative biomarkers. This combinatory approach will also enable researchers to more accurately and comprehensively understand the associations between different biologic pathways, mechanisms, and phenomena, and gain new insights into the etiology of CRC.