期刊文献+
共找到5篇文章
< 1 >
每页显示 20 50 100
SQL Server中text/image类型数据的使用
1
作者 兰丽辉 《电脑知识与技术(过刊)》 2007年第14期313-,315,共2页
使用MS SQL Server进行数据库软件开发时,对于text和image类型的数据在进行存取操作时,有别与其他数据类型.结合应用实例介绍了通过Transact-SQL语句、API函数、sp_tableoption存储过程、bcp命令、textcopy命令等几种常用的方法使用text... 使用MS SQL Server进行数据库软件开发时,对于text和image类型的数据在进行存取操作时,有别与其他数据类型.结合应用实例介绍了通过Transact-SQL语句、API函数、sp_tableoption存储过程、bcp命令、textcopy命令等几种常用的方法使用text/image类型数据. 展开更多
关键词 MS SQL SERVER text/image TRANSACT-SQL API函数
在线阅读 下载PDF
The Image and Text Relationship in TANG Yin's Scroll of Poetry and Painting
2
作者 YE La-mei 《Journal of Literature and Art Studies》 2011年第1期48-64,共17页
This paper seeks to examine the image and text relationship in TANG Yin's scroll of poetry and painting from three aspects: The first aspect focuses upon the schema type of its image and text relationship in physica... This paper seeks to examine the image and text relationship in TANG Yin's scroll of poetry and painting from three aspects: The first aspect focuses upon the schema type of its image and text relationship in physical form; the second aspect, explores the text's/poetry's functions of anchorage and relay while appreciating those images/paintings; the third aspect, traces the semiosis process of image, exploring how image and text as cultural products in the epistemological world mediates with the phenomenological world 展开更多
关键词 image and text relationship TANG Yin's scroll of poetry and painting the semiosis process
在线阅读 下载PDF
Text to image generation with bidirectional Multiway Transformers
3
作者 Hangbo Bao Li Dong +1 位作者 Songhao Piao Furu Wei 《Computational Visual Media》 2025年第2期405-422,共18页
In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency prov... In this study,we explore the potential of Multiway Transformers for text-to-image generation to achieve performance improvements through a concise and efficient decoupled model design and the inference efficiency provided by bidirectional encoding.We propose a method for improving the image tokenizer using pretrained Vision Transformers.Next,we employ bidirectional Multiway Transformers to restore the masked visual tokens combined with the unmasked text tokens.On the MS-COCO benchmark,our Multiway Transformers outperform vanilla Transformers,achieving superior FID scores and confirming the efficacy of the modality-specific parameter computation design.Ablation studies reveal that the fusion of visual and text tokens in bidirectional encoding contributes to improved model performance.Additionally,our proposed tokenizer outperforms VQGAN in image reconstruction quality and enhances the text-to-image generation results.By incorporating the additional CC-3M dataset for intermediate finetuning on our model with 688M parameters,we achieve competitive results with a finetuned FID score of 4.98 on MS-COCO. 展开更多
关键词 text to image generation VQ-VAE TRANSFORMER generative models
原文传递
Region-Aware Fashion Contrastive Learning for Unified Attribute Recognition and Composed Retrieval 被引量:1
4
作者 WANG Kangping ZHAO Mingbo 《Journal of Donghua University(English Edition)》 CAS 2024年第4期405-415,共11页
Clothing attribute recognition has become an essential technology,which enables users to automatically identify the characteristics of clothes and search for clothing images with similar attributes.However,existing me... Clothing attribute recognition has become an essential technology,which enables users to automatically identify the characteristics of clothes and search for clothing images with similar attributes.However,existing methods cannot recognize newly added attributes and may fail to capture region-level visual features.To address the aforementioned issues,a region-aware fashion contrastive language-image pre-training(RaF-CLIP)model was proposed.This model aligned cropped and segmented images with category and multiple fine-grained attribute texts,achieving the matching of fashion region and corresponding texts through contrastive learning.Clothing retrieval found suitable clothing based on the user-specified clothing categories and attributes,and to further improve the accuracy of retrieval,an attribute-guided composed network(AGCN)as an additional component on RaF-CLIP was introduced,specifically designed for composed image retrieval.This task aimed to modify the reference image based on textual expressions to retrieve the expected target.By adopting a transformer-based bidirectional attention and gating mechanism,it realized the fusion and selection of image features and attribute text features.Experimental results show that the proposed model achieves a mean precision of 0.6633 for attribute recognition tasks and a recall@10(recall@k is defined as the percentage of correct samples appearing in the top k retrieval results)of 39.18 for composed image retrieval task,satisfying user needs for freely searching for clothing through images and texts. 展开更多
关键词 attribute recognition image retrieval contrastive language-image pre-training(CLIP) image text matching transformer
在线阅读 下载PDF
KnowBench:Evaluating the Knowledge Alignment on Large Visual Language Models
5
作者 Zheng Ma Hao-Tian Yang +1 位作者 Jian-Bing Zhang Jia-Jun Chen 《Journal of Computer Science & Technology》 2025年第5期1209-1219,共11页
Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fai... Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fail to adequately assess the knowledge alignment between images and text,focusing primarily on answer accuracy rather than the reasoning processes behind them.To address this gap and enhance the understanding of LVLMs’capabilities,we introduce KnowBench,a novel benchmark designed to assess the alignment of knowledge between images and text for LVLMs.KnowBench comprises 1081 image-question pairs,each with four options and four pieces of corresponding knowledge across 11 major categories.We evaluate mainstream LVLMs on KnowBench,including proprietary models like Gemini,Claude,and GPT,and open-source models like LLaVA,Qwen-VL,and InternVL.Our experiments reveal a notable discrepancy in the models’abilities to select correct answers and corresponding knowledge whether the models are opensource or proprietary.This indicates that there is still a significant gap in the current LVLMs’knowledge alignment between images and text.Furthermore,our further analysis shows that model performance on KnowBench improves with increased parameters and version iterations.This indicates that scaling laws have a significant impact on multimodal knowledge alignment,and the iteration of the model by researchers also has a positive effect.We anticipate that KnowBench will foster the development of LVLMs and motivate researchers to develop more reliable models.We have made our dataset publicly available at https://doi.org/10.57760/sciencedb.29672. 展开更多
关键词 large visual language model(LVLM) knowledge alignment image and text fusing evaluation benchmark
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部