Aligning natural language with operating system(OS)commands allows users to performcomplex computer tasks through simple natural language descriptions.However,due to the complex nature of natural language,it still rem...Aligning natural language with operating system(OS)commands allows users to performcomplex computer tasks through simple natural language descriptions.However,due to the complex nature of natural language,it still remains challenging to achieve precise alignment.In this paper,we present ComAlign,a Chinese benchmark dataset that pairs Chinese natural language descriptions with corresponding OS commands.ComAlign covers a broad range of 82 distinct OS command types with a total of 1811 natural language descriptions.We elaborate on the construction of ComAlign and construct three baselines to evaluate the alignment accuracy on ComAlign.Experimental results show that even advanced large language models struggle with certain ambiguously phrased OS commands.Specifically,the best performing baseline achieves 46.9%alignment accuracy.We demonstrate that ComAlign is collected from realworld application scenarios,making it particularly suitable for developing and benchmarking intelligent OS and agent systems that support user-machine interactions through natural language.展开更多
基金supported by the National Key Research and Development Program under Grant 2024YFB4506200the Science and Technology Innovation Program of Hunan Province under Grant 2024RC1048the National Key Laboratory Foundation Project under Grant 2024-KJWPDL-14.
文摘Aligning natural language with operating system(OS)commands allows users to performcomplex computer tasks through simple natural language descriptions.However,due to the complex nature of natural language,it still remains challenging to achieve precise alignment.In this paper,we present ComAlign,a Chinese benchmark dataset that pairs Chinese natural language descriptions with corresponding OS commands.ComAlign covers a broad range of 82 distinct OS command types with a total of 1811 natural language descriptions.We elaborate on the construction of ComAlign and construct three baselines to evaluate the alignment accuracy on ComAlign.Experimental results show that even advanced large language models struggle with certain ambiguously phrased OS commands.Specifically,the best performing baseline achieves 46.9%alignment accuracy.We demonstrate that ComAlign is collected from realworld application scenarios,making it particularly suitable for developing and benchmarking intelligent OS and agent systems that support user-machine interactions through natural language.