In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, lead...In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.展开更多
With growing urban areas,the climate continues to change as a result of growing populations,and hence,the demand for better emergency response systems has become more important than ever.Human Behaviour Classi.cation(...With growing urban areas,the climate continues to change as a result of growing populations,and hence,the demand for better emergency response systems has become more important than ever.Human Behaviour Classi.cation(HBC)systems have started to play a vital role by analysing data from di.erent sources to detect signs of emergencies.These systems are being used inmany critical areas like healthcare,public safety,and disastermanagement to improve response time and to prepare ahead of time.But detecting human behaviour in such stressful conditions is not simple;it o.en comes with noisy data,missing information,and the need to react in real time.This review takes a deeper look at HBC research published between 2020 and 2025.and aims to answer.ve speci.c research questions.These questions cover the types of emergencies discussed in the literature,the datasets and sensors used,the e.ectiveness of machine learning(ML)and deep learning(DL)models,and the limitations that still exist in this.eld.We explored 120 papers that used di.erent types of datasets,some were based on sensor data,others on social media,and a few used hybrid approaches.Commonly used models included CNNs,LSTMs,and reinforcement learning methods to identify behaviours.Though a lot of progress has been made,the review found ongoing issues in combining sensors properly,reacting fast enough,and using more diverse datasets.Overall,from the.ndings we observed,the focus should be on building systems that use multiple sensors together,gather real-time data on a large scale,and produce results that are easier to interpret.Proper attention to privacy and ethical concerns needs to be addressed as well.展开更多
Breast cancer is one of the most common malignancies among women globally.Magnetic resonance imaging(MRI),as the final non-invasive diagnostic tool before biopsy,provides detailed free-text reports that support clinic...Breast cancer is one of the most common malignancies among women globally.Magnetic resonance imaging(MRI),as the final non-invasive diagnostic tool before biopsy,provides detailed free-text reports that support clinical decision-making.Therefore,the effective utilization of the information in MRI reports to make reliable decisions is crucial for patient care.This study proposes a novel method for BI-RADS classification using breast MRI reports.Large language models are employed to transform free-text reports into structured reports.Specifically,missing category information(MCI)that is absent in the free-text reports is supplemented by assigning default values to the missing categories in the structured reports.To ensure data privacy,a locally deployed Qwen-Chat model is employed.Furthermore,to enhance the domain-specific adaptability,a knowledge-driven prompt is designed.The Qwen-7B-Chat model is fine-tuned specifically for structuring breast MRI reports.To prevent information loss and enable comprehensive learning of all report details,a fusion strategy is introduced,combining free-text and structured reports to train the classification model.Experimental results show that the proposed BI-RADS classification method outperforms existing report classification methods across multiple evaluation metrics.Furthermore,an external test set from a different hospital is used to validate the robustness of the proposed approach.The proposed structured method surpasses GPT-4o in terms of performance.Ablation experiments confirm that the knowledge-driven prompt,MCI,and the fusion strategy are crucial to the model’s performance.展开更多
We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values ar...We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values are observed and which ones are missing), and estimatethe response probabilities by application of the Missing Information Principle (MIP). By this principle, we first derive the likelihood score equations for the case where the missing outcomes areactually observed, and then integrate out the unobserved outcomes from the score equationswith respect to the distribution holding for the missing data. The latter distribution is definedby the distribution fitted to the observed data for the respondents and the response model. Theintegrated score equations are then solved with respect to the unknown parameters indexingthe response model. Once the response probabilities have been estimated, we impute the missing outcomes from their appropriate distribution, yielding a complete data set with no missingvalues, which is used for predicting the target area means. A parametric bootstrap procedure isdeveloped for assessing the mean squared errors (MSE) of the resulting predictors. We illustratethe approach by a small simulation study.展开更多
文摘In his 1987 classic book on multiple imputation (MI), Rubin used the fraction of missing information, γ, to define the relative efficiency (RE) of MI as RE = (1 + γ/m)?1/2, where m is the number of imputations, leading to the conclusion that a small m (≤5) would be sufficient for MI. However, evidence has been accumulating that many more imputations are needed. Why would the apparently sufficient m deduced from the RE be actually too small? The answer may lie with γ. In this research, γ was determined at the fractions of missing data (δ) of 4%, 10%, 20%, and 29% using the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey (NAMCS). The γ values were strikingly small, ranging in the order of 10?6 to 0.01. As δ increased, γ usually increased but sometimes decreased. How the data were analysed had the dominating effects on γ, overshadowing the effect of δ. The results suggest that it is impossible to predict γ using δ and that it may not be appropriate to use the γ-based RE to determine sufficient m.
文摘With growing urban areas,the climate continues to change as a result of growing populations,and hence,the demand for better emergency response systems has become more important than ever.Human Behaviour Classi.cation(HBC)systems have started to play a vital role by analysing data from di.erent sources to detect signs of emergencies.These systems are being used inmany critical areas like healthcare,public safety,and disastermanagement to improve response time and to prepare ahead of time.But detecting human behaviour in such stressful conditions is not simple;it o.en comes with noisy data,missing information,and the need to react in real time.This review takes a deeper look at HBC research published between 2020 and 2025.and aims to answer.ve speci.c research questions.These questions cover the types of emergencies discussed in the literature,the datasets and sensors used,the e.ectiveness of machine learning(ML)and deep learning(DL)models,and the limitations that still exist in this.eld.We explored 120 papers that used di.erent types of datasets,some were based on sensor data,others on social media,and a few used hybrid approaches.Commonly used models included CNNs,LSTMs,and reinforcement learning methods to identify behaviours.Though a lot of progress has been made,the review found ongoing issues in combining sensors properly,reacting fast enough,and using more diverse datasets.Overall,from the.ndings we observed,the focus should be on building systems that use multiple sensors together,gather real-time data on a large scale,and produce results that are easier to interpret.Proper attention to privacy and ethical concerns needs to be addressed as well.
基金supported in part by the National Natural Science Foundation of China,Nos.62371499,U23A20483,82102130in part by the Department of Science and Technology of Shandong Province,No.SYS202208+2 种基金in part by the Suzhou Science and Technology Bureau,No.SJC2021023in part by the Guangdong Basic and Applied Basic Research Foundation,No.2023A1515011305in part by the Guangzhou Basic and Applied Basic Research Foundation,No.2023A04J2112.
文摘Breast cancer is one of the most common malignancies among women globally.Magnetic resonance imaging(MRI),as the final non-invasive diagnostic tool before biopsy,provides detailed free-text reports that support clinical decision-making.Therefore,the effective utilization of the information in MRI reports to make reliable decisions is crucial for patient care.This study proposes a novel method for BI-RADS classification using breast MRI reports.Large language models are employed to transform free-text reports into structured reports.Specifically,missing category information(MCI)that is absent in the free-text reports is supplemented by assigning default values to the missing categories in the structured reports.To ensure data privacy,a locally deployed Qwen-Chat model is employed.Furthermore,to enhance the domain-specific adaptability,a knowledge-driven prompt is designed.The Qwen-7B-Chat model is fine-tuned specifically for structuring breast MRI reports.To prevent information loss and enable comprehensive learning of all report details,a fusion strategy is introduced,combining free-text and structured reports to train the classification model.Experimental results show that the proposed BI-RADS classification method outperforms existing report classification methods across multiple evaluation metrics.Furthermore,an external test set from a different hospital is used to validate the robustness of the proposed approach.The proposed structured method surpasses GPT-4o in terms of performance.Ablation experiments confirm that the knowledge-driven prompt,MCI,and the fusion strategy are crucial to the model’s performance.
文摘We consider multivariate small area estimation under nonignorable, not missing at random(NMAR) nonresponse. We assume a response model that accounts for the different patterns ofthe observed outcomes, (which values are observed and which ones are missing), and estimatethe response probabilities by application of the Missing Information Principle (MIP). By this principle, we first derive the likelihood score equations for the case where the missing outcomes areactually observed, and then integrate out the unobserved outcomes from the score equationswith respect to the distribution holding for the missing data. The latter distribution is definedby the distribution fitted to the observed data for the respondents and the response model. Theintegrated score equations are then solved with respect to the unknown parameters indexingthe response model. Once the response probabilities have been estimated, we impute the missing outcomes from their appropriate distribution, yielding a complete data set with no missingvalues, which is used for predicting the target area means. A parametric bootstrap procedure isdeveloped for assessing the mean squared errors (MSE) of the resulting predictors. We illustratethe approach by a small simulation study.