摘要
在生成式人工智能生成物与版权材料构成实质性相似时,在法律定性上必须从技术本质出发。生成式人工智能的基础是大模型,它将训练数据样本分解成令牌以确定内容特征上的统计相关性,输出过程是具有统计相关性特征的随机性重组。大模型通过应用程序编程接口与用户交互而形成人类能理解的作品,但在特定情形下可能照搬训练数据中的版权材料。大模型并非作品的复制工具,不属于“索尼案”实质性非侵权用途标准的适用范围,但属于具有双重用途的技术。大模型既未主动输出、也未存储生成物,用户输入提示词、参数等并选择最终的成品;其大多数应用场景与搜索引擎类似,属于信息社会的新型服务,应适用“通知移除”制度等“避风港”规则,并不负有公法上网络治理规则施加的义务。服务提供者在大模型开发、部署时采取了技术可行且考虑合理使用情形的必要措施,不承担赔偿责任。大模型输出生成物类似于用户通过使用搜索引擎获得相应结果,故“避风港”规则不能延伸适用于生成物的传播和商业利用行为,后者将构成直接侵犯版权的行为。
Where the output of generative AI(GenAI)constitutes a substantial similarity to copyrighted material,that how to apply copyright law must start from the nature of the technology.Based on Large Language Model(LLM),GenAI decomposes samples of training data into tokens to determine statistical relevance on content features,and the output process is a stochastic recombination of such features.LLM,which interacts with the user through API,outputs original works that humans can understand,but it also‘extract"copyrighted material from the training data under certain circumstances.LLM is not a tool for reproducing works,and does not qualify for the substantial non-infringing uses exemption in Sony Case.LLM is the dual-uses technology which neither active output nor stored such outputs.It is the user who inputs parameters and prompt and so on,and selects the final product.Although its major scenarios are similar to a search engine,LLM is a new type of service provider in the information society,and it should be subject to the notice and takedown system rather than the obligations conferred by the rules of network governance under public law.The provider designing or/and deploying LLM should take the necessary measures to block the output of copyrighted material in a technically feasible manner and in which takes into account fair use doctrine.However,this rule does not extend to the dissemination and the commercial exploitation of outputs.
出处
《法学》
北大核心
2025年第6期129-146,共18页
Law Science
基金
国家社科基金重大项目“全球知识产权治理面临的主要挑战和中国方案研究”(23ZDA124)的阶段性成果。