The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decis...The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature.展开更多
As various types of data grow explosively,largescale data storage,backup,and transmission become challenging,which motivates many researchers to propose efficient universal compression algorithms for multi-source data...As various types of data grow explosively,largescale data storage,backup,and transmission become challenging,which motivates many researchers to propose efficient universal compression algorithms for multi-source data.In recent years,due to the emergence of hardware acceleration devices such as GPUs,TPUs,DPUs,and FPGAs,the performance bottleneck of neural networks(NN)has been overcome,making NN-based compression algorithms increasingly practical and popular.However,the research survey for the NN-based universal lossless compressors has not been conducted yet,and there is also a lack of unified evaluation metrics.To address the above problems,in this paper,we present a holistic survey as well as benchmark evaluations.Specifically,i)we thoroughly investigate NNbased lossless universal compression algorithms toward multisource data and classify them into 3 types:static pre-training,adaptive,and semi-adaptive.ii)We unify 19 evaluation metrics to comprehensively assess the compression effect,resource consumption,and model performance of compressors.iii)We conduct experiments more than 4600 CPU/GPU hours to evaluate 17 state-of-the-art compressors on 28 real-world datasets across data types of text,images,videos,audio,etc.iv)We also summarize the strengths and drawbacks of NNbased lossless data compressors and discuss promising research directions.We summarize the results as the NN-based Lossless Compressors Benchmark(NNLCB,See fahaihi.github.io/NNLCB website),which will be updated and maintained continuously in the future.展开更多
Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fai...Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fail to adequately assess the knowledge alignment between images and text,focusing primarily on answer accuracy rather than the reasoning processes behind them.To address this gap and enhance the understanding of LVLMs’capabilities,we introduce KnowBench,a novel benchmark designed to assess the alignment of knowledge between images and text for LVLMs.KnowBench comprises 1081 image-question pairs,each with four options and four pieces of corresponding knowledge across 11 major categories.We evaluate mainstream LVLMs on KnowBench,including proprietary models like Gemini,Claude,and GPT,and open-source models like LLaVA,Qwen-VL,and InternVL.Our experiments reveal a notable discrepancy in the models’abilities to select correct answers and corresponding knowledge whether the models are opensource or proprietary.This indicates that there is still a significant gap in the current LVLMs’knowledge alignment between images and text.Furthermore,our further analysis shows that model performance on KnowBench improves with increased parameters and version iterations.This indicates that scaling laws have a significant impact on multimodal knowledge alignment,and the iteration of the model by researchers also has a positive effect.We anticipate that KnowBench will foster the development of LVLMs and motivate researchers to develop more reliable models.We have made our dataset publicly available at https://doi.org/10.57760/sciencedb.29672.展开更多
文摘The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature.
基金supported by the National Natural Science Foundation of China(Grant Nos.62272253 and 62272252)the Fundamental Research Funds for the Central Universities.It was also supported in part by the China Scholarship Council(CSC202406200085)the Innovation Project of Guangxi Graduate Education(YCBZ2024005).
文摘As various types of data grow explosively,largescale data storage,backup,and transmission become challenging,which motivates many researchers to propose efficient universal compression algorithms for multi-source data.In recent years,due to the emergence of hardware acceleration devices such as GPUs,TPUs,DPUs,and FPGAs,the performance bottleneck of neural networks(NN)has been overcome,making NN-based compression algorithms increasingly practical and popular.However,the research survey for the NN-based universal lossless compressors has not been conducted yet,and there is also a lack of unified evaluation metrics.To address the above problems,in this paper,we present a holistic survey as well as benchmark evaluations.Specifically,i)we thoroughly investigate NNbased lossless universal compression algorithms toward multisource data and classify them into 3 types:static pre-training,adaptive,and semi-adaptive.ii)We unify 19 evaluation metrics to comprehensively assess the compression effect,resource consumption,and model performance of compressors.iii)We conduct experiments more than 4600 CPU/GPU hours to evaluate 17 state-of-the-art compressors on 28 real-world datasets across data types of text,images,videos,audio,etc.iv)We also summarize the strengths and drawbacks of NNbased lossless data compressors and discuss promising research directions.We summarize the results as the NN-based Lossless Compressors Benchmark(NNLCB,See fahaihi.github.io/NNLCB website),which will be updated and maintained continuously in the future.
基金supported by the National Natural Science Foundation of China under Grant No.62176115.
文摘Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fail to adequately assess the knowledge alignment between images and text,focusing primarily on answer accuracy rather than the reasoning processes behind them.To address this gap and enhance the understanding of LVLMs’capabilities,we introduce KnowBench,a novel benchmark designed to assess the alignment of knowledge between images and text for LVLMs.KnowBench comprises 1081 image-question pairs,each with four options and four pieces of corresponding knowledge across 11 major categories.We evaluate mainstream LVLMs on KnowBench,including proprietary models like Gemini,Claude,and GPT,and open-source models like LLaVA,Qwen-VL,and InternVL.Our experiments reveal a notable discrepancy in the models’abilities to select correct answers and corresponding knowledge whether the models are opensource or proprietary.This indicates that there is still a significant gap in the current LVLMs’knowledge alignment between images and text.Furthermore,our further analysis shows that model performance on KnowBench improves with increased parameters and version iterations.This indicates that scaling laws have a significant impact on multimodal knowledge alignment,and the iteration of the model by researchers also has a positive effect.We anticipate that KnowBench will foster the development of LVLMs and motivate researchers to develop more reliable models.We have made our dataset publicly available at https://doi.org/10.57760/sciencedb.29672.