期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
Singularity of Software Reliability Models LVLM and LVQM
1
作者 XU Ren zuo ZHOU Rui YANG Xiao qing 《Wuhan University Journal of Natural Sciences》 EI CAS 2000年第2期150-154,共5页
According to the principle, “The failure data is the basis of software reliabilityanalysis”, we built a software reliability expert system (SRES) by adopting the artificialtechnology. By reasoning out the conclusion... According to the principle, “The failure data is the basis of software reliabilityanalysis”, we built a software reliability expert system (SRES) by adopting the artificialtechnology. By reasoning out the conclusion from the fitting results of failure data of asoftware project, the SRES can recommend users “the most suitable model” as a softwarereliability measurement model. We believe that the SRES can overcome the inconsistency inapplications of software reliability models well. We report investigation results of singularity and parameter estimation methods of models, LVLM and LVQM. 展开更多
关键词 software reliability measurement models software reliability expert system SINGULARITY parameter estimation method lvlm LVQM
在线阅读 下载PDF
基于图像对比增强的大型视觉语言模型物体幻觉缓解
2
作者 卜立平 常贵勇 +4 位作者 于碧辉 刘大伟 魏靖烜 孙林壮 刘龙翼 《计算机系统应用》 2025年第5期107-115,共9页
大型视觉语言模型(LVLM)在理解视觉信息和运用语言表达方面展现出了非凡的能力.然而,在LVLM进行问答的过程中,它通常受到物体幻觉问题的困扰,具体表现为生成的文本内容看似合理,但实际上却与图片中的信息不相符,造成了文本与图片之间的... 大型视觉语言模型(LVLM)在理解视觉信息和运用语言表达方面展现出了非凡的能力.然而,在LVLM进行问答的过程中,它通常受到物体幻觉问题的困扰,具体表现为生成的文本内容看似合理,但实际上却与图片中的信息不相符,造成了文本与图片之间的不匹配现象.为解决这一问题,本文通过实验发现,物体注意力的缺失是导致物体幻觉的关键因素.为缓解此问题,本文引入了图像对比增强方法(ICE).ICE是一种无需训练、操作简便的方法,通过对比原始视觉输入与增强视觉输入所产生的输出分布,有效提升模型对图片的感知能力,确保生成的内容与视觉输入紧密契合,从而生成上下文一致且准确的输出.实验结果显示,ICE方法在无需额外训练或外部工具的情况下,便能显著减轻不同LVLM的物体幻觉问题,并在大型视觉语言模型基准MME测试中同样表现出色,验证了其广泛的适用性和有效性.本文代码链接:ChangGuiyong/ICE. 展开更多
关键词 大型视觉语言模型 物体幻觉 图像对比增强 人工智能
在线阅读 下载PDF
Effectiveness assessment of recent large vision-language models 被引量:1
3
作者 Yao Jiang Xinyu Yan +5 位作者 Ge-Peng Ji Keren Fu Meijun Sun Huan Xiong Deng-Ping Fan Fahad Shahbaz Khan 《Visual Intelligence》 2024年第1期197-213,共17页
The advent of large vision-language models(LVLMs)represents a remarkable advance in the quest for artificial general intelligence.However,the models’effectiveness in both specialized and general tasks warrants furthe... The advent of large vision-language models(LVLMs)represents a remarkable advance in the quest for artificial general intelligence.However,the models’effectiveness in both specialized and general tasks warrants further investigation.This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks,respectively,aiming to offer a comprehensive understanding of these novel models.To gauge their effectiveness in specialized tasks,we employ six challenging tasks in three different application scenarios:natural,healthcare,and industrial.These six tasks include salient/camouflaged/transparent object detection,as well as polyp detection,skin lesion detection,and industrial anomaly detection.We examine the performance of three recent open-source LVLMs,including MiniGPT-v2,LLaVA-1.5,and Shikra,on both visual recognition and localization in these tasks.Moreover,we conduct empirical investigations utilizing the aforementioned LVLMs together with GPT-4V,assessing their multi-modal understanding capabilities in general tasks including object counting,absurd question answering,affordance reasoning,attribute recognition,and spatial relation reasoning.Our investigations reveal that these LVLMs demonstrate limited proficiency not only in specialized tasks but also in general tasks.We delve deep into this inadequacy and uncover several potential factors,including limited cognition in specialized tasks,object hallucination,text-to-image interference,and decreased robustness in complex problems.We hope that this study can provide useful insights for the future development of LVLMs,helping researchers improve LVLMs for both general and specialized applications. 展开更多
关键词 Large vision-language models(lvlms) Recognition LOCALIZATION Multi-modal understanding
在线阅读 下载PDF
KnowBench:Evaluating the Knowledge Alignment on Large Visual Language Models
4
作者 Zheng Ma Hao-Tian Yang +1 位作者 Jian-Bing Zhang Jia-Jun Chen 《Journal of Computer Science & Technology》 2025年第5期1209-1219,共11页
Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fai... Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fail to adequately assess the knowledge alignment between images and text,focusing primarily on answer accuracy rather than the reasoning processes behind them.To address this gap and enhance the understanding of LVLMs’capabilities,we introduce KnowBench,a novel benchmark designed to assess the alignment of knowledge between images and text for LVLMs.KnowBench comprises 1081 image-question pairs,each with four options and four pieces of corresponding knowledge across 11 major categories.We evaluate mainstream LVLMs on KnowBench,including proprietary models like Gemini,Claude,and GPT,and open-source models like LLaVA,Qwen-VL,and InternVL.Our experiments reveal a notable discrepancy in the models’abilities to select correct answers and corresponding knowledge whether the models are opensource or proprietary.This indicates that there is still a significant gap in the current LVLMs’knowledge alignment between images and text.Furthermore,our further analysis shows that model performance on KnowBench improves with increased parameters and version iterations.This indicates that scaling laws have a significant impact on multimodal knowledge alignment,and the iteration of the model by researchers also has a positive effect.We anticipate that KnowBench will foster the development of LVLMs and motivate researchers to develop more reliable models.We have made our dataset publicly available at https://doi.org/10.57760/sciencedb.29672. 展开更多
关键词 large visual language model(lvlm) knowledge alignment image and text fusing evaluation benchmark
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部