Artificial intelligence is reshaping radiology by enabling automated report generation,yet evaluating the clinical accuracy and relevance of these reports is a challenging task,as traditional natural language generati...Artificial intelligence is reshaping radiology by enabling automated report generation,yet evaluating the clinical accuracy and relevance of these reports is a challenging task,as traditional natural language generation metrics like BLEU and ROUGE prioritize lexical overlap over clinical relevance.To address this gap,we propose a novel semantic assessment framework for evaluating the accuracy of artificial intelligence-generated radiology reports against ground truth references.We trained 5229 image–report pairs from the Indiana University chest X-ray dataset on the R2GenRL model and generated a benchmark dataset on test data from the Indiana University chest X-ray and MIMIC-CXR datasets.These datasets were selected for their public availability,large scale,and comprehensive coverage of diverse clinical cases in chest radiography,enabling robust evaluation and comparison with prior work.Results demonstrate that the Mistral model,particularly with task-oriented prompting,achieves superior performance(up to 91.9%accuracy),surpassing other models and closely aligning with established metrics like BERTScore-F1(88.1%)and CLIP-Score(88.7%).Statistical analyses,including paired t-tests(p<0.01)and analysis of variance(p<0.05),confirm significant improvements driven by structured prompting.Failure case analysis reveals limitations,such as over-reliance on lexical similarity,underscoring the need for domain-specific fine-tuning.This framework advances the evaluation of artificial intelligence-driven(AI-driven)radiology report generation,offering a robust,clinically relevant metric for assessing semantic accuracy and paving the way for more reliable automated systems in medical imaging.展开更多
This paper focuses on the issues of categorical database gen-eralization and emphasizes the roles ofsupporting data model,integrated datamodel,spatial analysis and semanticanalysis in database generalization.The frame...This paper focuses on the issues of categorical database gen-eralization and emphasizes the roles ofsupporting data model,integrated datamodel,spatial analysis and semanticanalysis in database generalization.The framework contents of categoricaldatabase generalization transformationare defined.This paper presents an in-tegrated spatial supporting data struc-ture,a semantic supporting model andsimilarity model for the categorical da-tabase generalization.The concept oftransformation unit is proposed in generalization.展开更多
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)-Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government(MSIT)(IITP-2024-RS-2024-00436773).
文摘Artificial intelligence is reshaping radiology by enabling automated report generation,yet evaluating the clinical accuracy and relevance of these reports is a challenging task,as traditional natural language generation metrics like BLEU and ROUGE prioritize lexical overlap over clinical relevance.To address this gap,we propose a novel semantic assessment framework for evaluating the accuracy of artificial intelligence-generated radiology reports against ground truth references.We trained 5229 image–report pairs from the Indiana University chest X-ray dataset on the R2GenRL model and generated a benchmark dataset on test data from the Indiana University chest X-ray and MIMIC-CXR datasets.These datasets were selected for their public availability,large scale,and comprehensive coverage of diverse clinical cases in chest radiography,enabling robust evaluation and comparison with prior work.Results demonstrate that the Mistral model,particularly with task-oriented prompting,achieves superior performance(up to 91.9%accuracy),surpassing other models and closely aligning with established metrics like BERTScore-F1(88.1%)and CLIP-Score(88.7%).Statistical analyses,including paired t-tests(p<0.01)and analysis of variance(p<0.05),confirm significant improvements driven by structured prompting.Failure case analysis reveals limitations,such as over-reliance on lexical similarity,underscoring the need for domain-specific fine-tuning.This framework advances the evaluation of artificial intelligence-driven(AI-driven)radiology report generation,offering a robust,clinically relevant metric for assessing semantic accuracy and paving the way for more reliable automated systems in medical imaging.
基金supported by the National Natural Science Foundation(No.40271088)the Research Fund of International Instit of Geo-intormation Science and Earn Obsorvation.
文摘This paper focuses on the issues of categorical database gen-eralization and emphasizes the roles ofsupporting data model,integrated datamodel,spatial analysis and semanticanalysis in database generalization.The framework contents of categoricaldatabase generalization transformationare defined.This paper presents an in-tegrated spatial supporting data struc-ture,a semantic supporting model andsimilarity model for the categorical da-tabase generalization.The concept oftransformation unit is proposed in generalization.