Background: Population-based cancer survival is a key metric in evaluating the overall effectiveness of health services and cancer control activities. Advancement in information technology enables accurate vital statu...Background: Population-based cancer survival is a key metric in evaluating the overall effectiveness of health services and cancer control activities. Advancement in information technology enables accurate vital status tracking through multi-source data linkage. However, its reliability for survival estimates in China is unclear.Methods: We analyzed data from Dalian Cancer Registry to evaluate the reliability of multi-source data linkage for population-based cancer survival estimates in China. Newly diagnosed cancer patients in 2015 were included and followed until June 2021. We conducted single-source data linkage by linking patients to Dalian Vital Statistics System, and multi-source data linkage by further linking to Dalian Household Registration System and the hospital medical records. Patient vital status was subsequently determined through active follow-up via telephone calls, referred to as comprehensive follow-up, which served as the gold standard. Using the cohort method, we calculated 5-year observed survival and age-standardized relative survival for 20 cancer types and all cancers combined.Results: Compared to comprehensive follow-up, single-source data linkage overestimated 5-year observed survival by 3.2% for all cancers combined, ranging from 0.1% to 8.6% across 20 cancer types. Multi-source data linkage provided a relatively complete patient vital status, with an observed survival estimate of only 0.3% higher for all cancers, ranging from 0% to1.5% across 20 cancer types.Conclusion: Multi-source data linkage contributes to reliable population-based cancer survival estimates in China. Linkage of multiple databases might be of great value in improving the efficiency of follow-up and the quality of survival data for cancer patients in developing countries.展开更多
A standard assumption when modelling linked sample data is that the stochastic properties of the linking process and process underpinning the population values of the response variable are independent of one another.T...A standard assumption when modelling linked sample data is that the stochastic properties of the linking process and process underpinning the population values of the response variable are independent of one another.This is often referred to as non-informative linkage.But what if linkage errors are informative?In this paper,we provide results from two simulation experiments that explore two potential informative linking scenarios.The first is where the choice of sample record to link is dependent on the response;and the second is where the probability of correct linkage is dependent on the response.We focus on the important and widely applicable problem of estimation of domain means given linked data,and provide empirical evidence that while standard domain estimation methods can be substantially biased in the presence of informative linkage errors,an alternative estimation method,based on a Gaussian approximation to a maximum likelihood estimator that allows for non-informative linkage error,performs well.展开更多
基金supported by the National Key R&D Program of China (2022YFC3600805 and 2021YFC2501900)
文摘Background: Population-based cancer survival is a key metric in evaluating the overall effectiveness of health services and cancer control activities. Advancement in information technology enables accurate vital status tracking through multi-source data linkage. However, its reliability for survival estimates in China is unclear.Methods: We analyzed data from Dalian Cancer Registry to evaluate the reliability of multi-source data linkage for population-based cancer survival estimates in China. Newly diagnosed cancer patients in 2015 were included and followed until June 2021. We conducted single-source data linkage by linking patients to Dalian Vital Statistics System, and multi-source data linkage by further linking to Dalian Household Registration System and the hospital medical records. Patient vital status was subsequently determined through active follow-up via telephone calls, referred to as comprehensive follow-up, which served as the gold standard. Using the cohort method, we calculated 5-year observed survival and age-standardized relative survival for 20 cancer types and all cancers combined.Results: Compared to comprehensive follow-up, single-source data linkage overestimated 5-year observed survival by 3.2% for all cancers combined, ranging from 0.1% to 8.6% across 20 cancer types. Multi-source data linkage provided a relatively complete patient vital status, with an observed survival estimate of only 0.3% higher for all cancers, ranging from 0% to1.5% across 20 cancer types.Conclusion: Multi-source data linkage contributes to reliable population-based cancer survival estimates in China. Linkage of multiple databases might be of great value in improving the efficiency of follow-up and the quality of survival data for cancer patients in developing countries.
文摘A standard assumption when modelling linked sample data is that the stochastic properties of the linking process and process underpinning the population values of the response variable are independent of one another.This is often referred to as non-informative linkage.But what if linkage errors are informative?In this paper,we provide results from two simulation experiments that explore two potential informative linking scenarios.The first is where the choice of sample record to link is dependent on the response;and the second is where the probability of correct linkage is dependent on the response.We focus on the important and widely applicable problem of estimation of domain means given linked data,and provide empirical evidence that while standard domain estimation methods can be substantially biased in the presence of informative linkage errors,an alternative estimation method,based on a Gaussian approximation to a maximum likelihood estimator that allows for non-informative linkage error,performs well.