With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence(AGI),there is a growing need for corresponding evaluation systems.Systematic AGI evaluation requires tasks...With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence(AGI),there is a growing need for corresponding evaluation systems.Systematic AGI evaluation requires tasks that encompass a wide range of ability dimensions and difficulty levels.However,although many benchmarks exist,the field still lacks a quantification system to assess ability decompositions or difficulty levels.Here,we took the visual domain as a starting point and proposed an explainable system for task ability decomposition and difficulty level quantification of vision(TADDL-V).Using large language models,TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between ability sets and task difficulty levels.The estimated ability masses align with human intuition,and TADDL-V's task difficulty estimates are empirically validated against aggregated human comparisons of task difficulty.Furthermore,we proposed an AGI visual evaluation task set,AGI-V70,comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum of task difficulties.Together,TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification,which are essential for future AGI evaluations.展开更多
基金supported by the National Science and Technology Major Project(Grant No.2022ZD0114900)the National Natural Science Foundation of China(Grant Nos.32471151,32200854)the Young Elite Scientists Sponsorship Program(Grant No.2021QNRC00)to Yujia PENG。
文摘With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence(AGI),there is a growing need for corresponding evaluation systems.Systematic AGI evaluation requires tasks that encompass a wide range of ability dimensions and difficulty levels.However,although many benchmarks exist,the field still lacks a quantification system to assess ability decompositions or difficulty levels.Here,we took the visual domain as a starting point and proposed an explainable system for task ability decomposition and difficulty level quantification of vision(TADDL-V).Using large language models,TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between ability sets and task difficulty levels.The estimated ability masses align with human intuition,and TADDL-V's task difficulty estimates are empirically validated against aggregated human comparisons of task difficulty.Furthermore,we proposed an AGI visual evaluation task set,AGI-V70,comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum of task difficulties.Together,TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification,which are essential for future AGI evaluations.