期刊文献+

人工智能语料图书馆:内涵、功能需求与建设路径

Artificial Intelligence Corpus Library:Connotations,Functional Requirement,and Construction Pathways
原文传递
导出
摘要 [目的/意义]在AI4S加速发展和国家“人工智能+科学技术”战略背景下,高质量、可计算的语料资源已成为支撑大模型预训练与智能科学发现的核心要素。图书馆作为社会基础设施,其功能正面临向人工智能语料建设与服务转型的现实要求。界定“人工智能语料图书馆”的内涵和核心功能,为相关理论研究与建设实践提供参考。[方法/过程]通过概念辨析与演进分析,厘清人工智能语料图书馆的本质特征,将其阐释为“数据库×语料库×数字图书馆”的深度融合与功能重构。从技术演进、社会应用及监管治理3个维度剖析其功能需求,并结合美国HathiTrust的“文本即数据”模式与英国国家图书馆的“数字学术”实践,总结数字图书馆向语料图书馆的转型逻辑。[结果/结论]研究认为,人工智能语料图书馆是以多模态、可计算语料为核心对象的新型知识基础设施,支撑智能设施的高效稳定运行,是实现人工智能治理的有效途径之一。其建设应遵循以数据驱动为顶层逻辑、以知识组织为中间机制、以智能体应用为功能创新的架构体系。通过既有馆藏数据的语料化升级,语料图书馆构建“非消耗式利用”增值服务模式,嵌入数字学术流程,实现功能拓展与智能升级。 [Purpose/Significance]Under the backdrop of the accelerated development of AI for Science(AI4S)and national strategies on“AI+Science and Technology”,high-quality,computable corpus resources have become a critical factor for large-model pretraining and intelligent scientific discovery.As the social infrastructures,libraries are facing growing demands to transform toward the construction and provision of AI-oriented corpus resources and services.The article aims to clarify the annotation of the“AI corpus library”and to provide a reference framework for related theoretical research and practical implementation.[Method/Process]By employing conceptual and evolutionary analysis,the article clarifies the essential characteristics of the AI corpus library,interpreting it as a deep integration and functional reconstruction of“Database×Corpus×Digital Library”.It analyzes functional requirements from three dimensions:technological evolution,social application,and regulatory governance.Furthermore,by synthesizing the“text-as-data”model of HathiTrust and the“digital scholarship”practice of the British Library,the paper systematically summarizes the evolutionary logic of the transition from digital libraries to AI corpus libraries.[Result/Conclusion]The study shows that the AI corpus library is a new type of knowledge infrastructure oriented toward“human-machine agent”entities,with multi-modal and computable corpora as its core organizational objects.Its construction should follow an architectural system characterized by data-driven top-level logic,knowledge organization as the intermediate mechanism,and agent-based applications as the functional innovation.The corpusization of existing library collections,the establishment of“non-consumptive use”governance mechanisms,and the embedding of corpus services into digital scholarship workflows represent effective paths for libraries to achieve functional expansion and intelligent upgrading.
作者 刘细文 钱力 涂志芳 Liu Xiwen;Qian Li;Tu Zhifang(National Science Library,Chinese Academy of Sciences,Beijing 100190;Department of Information Resource Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190)
出处 《图书情报工作》 北大核心 2026年第4期3-12,共10页 Library and Information Service
基金 国家自然科学基金重点项目“数智赋能的科技信息资源与知识管理理论变革”(项目编号:72234005)研究成果之一。
关键词 人工智能 人工智能语料图书馆 高质量数据 知识基础设施 人工智能治理 Artificial Intelligence AI corpus library high-quality data knowledge infrastructure AI governance
  • 相关文献

参考文献20

二级参考文献146

共引文献1285

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部