Algorithms are the primary component of Artificial Intelligence(AI).The algorithm is the process in AI that imitates the human mind to solve problems.Currently evaluating the performance of AI is achieved by evaluatin...Algorithms are the primary component of Artificial Intelligence(AI).The algorithm is the process in AI that imitates the human mind to solve problems.Currently evaluating the performance of AI is achieved by evaluating AI algorithms by metric scores on data sets.However the evaluation of algorithms in AI is challenging because the evaluation of the same type of algorithm has many data sets and evaluation metrics.Different algorithms may have individual strengths and weaknesses in evaluation metric scores on separate data sets,lacking the credibility and validity of the evaluation.Moreover,evaluation of algorithms requires repeated experiments on different data sets,reducing the attention of researchers to the research of the algorithms itself.Crucially,this approach to evaluating comparative metric scores does not take into account the algorithm’s ability to solve problems.And the classical algorithm evaluation of time and space complexity is not suitable for evaluating AI algorithms.Because classical algorithms input is infinite numbers,whereas AI algorithms input is a data set,which is limited and multifarious.According to the AI algorithm evaluation without response to the problem solving capability,this paper summarizes the features of AI algorithm evaluation and proposes an AI evaluation method that incorporates the problem-solving capabilities of algorithms.展开更多
This paper takes Chinese red culture resources as its research subject and focuses on evaluating the Chinese-English translation quality of three major AI platforms:ChatGPT-4.0,ERNIE Bot,and DeepSeek.Through automatic...This paper takes Chinese red culture resources as its research subject and focuses on evaluating the Chinese-English translation quality of three major AI platforms:ChatGPT-4.0,ERNIE Bot,and DeepSeek.Through automatic quantitative evaluation,it systematically analyzes their performance in translating red culture texts.The study selects a diverse range of corpora,including historical documents,red classic texts,and culturally loaded terms.Three automatic evaluation metrics—GLEU,METEOR,and COMET—are employed for a comprehensive assessment.展开更多
With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence(AGI),there is a growing need for corresponding evaluation systems.Systematic AGI evaluation requires tasks...With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence(AGI),there is a growing need for corresponding evaluation systems.Systematic AGI evaluation requires tasks that encompass a wide range of ability dimensions and difficulty levels.However,although many benchmarks exist,the field still lacks a quantification system to assess ability decompositions or difficulty levels.Here,we took the visual domain as a starting point and proposed an explainable system for task ability decomposition and difficulty level quantification of vision(TADDL-V).Using large language models,TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between ability sets and task difficulty levels.The estimated ability masses align with human intuition,and TADDL-V's task difficulty estimates are empirically validated against aggregated human comparisons of task difficulty.Furthermore,we proposed an AGI visual evaluation task set,AGI-V70,comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum of task difficulties.Together,TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification,which are essential for future AGI evaluations.展开更多
This paper describes the utilization of artificial intelligence (AI) techniques to identify an optimal machine learning (ML) model for predicting dodecane fuel consumption in diesel combustion. The study incorporates ...This paper describes the utilization of artificial intelligence (AI) techniques to identify an optimal machine learning (ML) model for predicting dodecane fuel consumption in diesel combustion. The study incorporates sensitivity analysis to assess the impact levels of various parameters on fuel consumption, thereby highlighting the most influential factors. In addition, this study addresses the impact of noise and implements data cleaning techniques to ensure the reliability of the obtained results. To validate the accuracy of the predictions, the study performs several metrics and validation process, including comparisons with computational fluid dynamics (CFD) results and experimental data. Comprehensive comparisons are made among neural networks (NN), random forest regression (RFR), and Gaussian process regression (GPR) models, taking into account the complexity associated with fuel consumption predictions. The findings demonstrate that the GPR model outperforms the others in terms of accuracy, as evidenced by metrics such as mean absolute error (MAE), mean squared error (MSE), Pearson coefficient (PC), and R-squared (R2). The GPR model exhibits superior predictive ability, accurately detecting and predicting even individual data points that deviate from the overall trend. The significantly lower absolute error values also consistently indicate its higher accuracy compared with the NN and RFR models. Furthermore, the GPR model shows a remarkable speedup, approximately 1.7 times faster than traditional CFD solvers, and physically captures the momentum and thermal characteristics in a surface field prediction. Finally, the target optimization is assessed using the Euclidean distance as a fitness function, ensuring the reliability of predicted data.展开更多
基金funded by the General Program of the National Natural Science Foundation of China grant number[62277022].
文摘Algorithms are the primary component of Artificial Intelligence(AI).The algorithm is the process in AI that imitates the human mind to solve problems.Currently evaluating the performance of AI is achieved by evaluating AI algorithms by metric scores on data sets.However the evaluation of algorithms in AI is challenging because the evaluation of the same type of algorithm has many data sets and evaluation metrics.Different algorithms may have individual strengths and weaknesses in evaluation metric scores on separate data sets,lacking the credibility and validity of the evaluation.Moreover,evaluation of algorithms requires repeated experiments on different data sets,reducing the attention of researchers to the research of the algorithms itself.Crucially,this approach to evaluating comparative metric scores does not take into account the algorithm’s ability to solve problems.And the classical algorithm evaluation of time and space complexity is not suitable for evaluating AI algorithms.Because classical algorithms input is infinite numbers,whereas AI algorithms input is a data set,which is limited and multifarious.According to the AI algorithm evaluation without response to the problem solving capability,this paper summarizes the features of AI algorithm evaluation and proposes an AI evaluation method that incorporates the problem-solving capabilities of algorithms.
基金Shanxi Normal University Graduate Innovation Project(2024XSY31)。
文摘This paper takes Chinese red culture resources as its research subject and focuses on evaluating the Chinese-English translation quality of three major AI platforms:ChatGPT-4.0,ERNIE Bot,and DeepSeek.Through automatic quantitative evaluation,it systematically analyzes their performance in translating red culture texts.The study selects a diverse range of corpora,including historical documents,red classic texts,and culturally loaded terms.Three automatic evaluation metrics—GLEU,METEOR,and COMET—are employed for a comprehensive assessment.
基金supported by the National Science and Technology Major Project(Grant No.2022ZD0114900)the National Natural Science Foundation of China(Grant Nos.32471151,32200854)the Young Elite Scientists Sponsorship Program(Grant No.2021QNRC00)to Yujia PENG。
文摘With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence(AGI),there is a growing need for corresponding evaluation systems.Systematic AGI evaluation requires tasks that encompass a wide range of ability dimensions and difficulty levels.However,although many benchmarks exist,the field still lacks a quantification system to assess ability decompositions or difficulty levels.Here,we took the visual domain as a starting point and proposed an explainable system for task ability decomposition and difficulty level quantification of vision(TADDL-V).Using large language models,TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between ability sets and task difficulty levels.The estimated ability masses align with human intuition,and TADDL-V's task difficulty estimates are empirically validated against aggregated human comparisons of task difficulty.Furthermore,we proposed an AGI visual evaluation task set,AGI-V70,comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum of task difficulties.Together,TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification,which are essential for future AGI evaluations.
文摘This paper describes the utilization of artificial intelligence (AI) techniques to identify an optimal machine learning (ML) model for predicting dodecane fuel consumption in diesel combustion. The study incorporates sensitivity analysis to assess the impact levels of various parameters on fuel consumption, thereby highlighting the most influential factors. In addition, this study addresses the impact of noise and implements data cleaning techniques to ensure the reliability of the obtained results. To validate the accuracy of the predictions, the study performs several metrics and validation process, including comparisons with computational fluid dynamics (CFD) results and experimental data. Comprehensive comparisons are made among neural networks (NN), random forest regression (RFR), and Gaussian process regression (GPR) models, taking into account the complexity associated with fuel consumption predictions. The findings demonstrate that the GPR model outperforms the others in terms of accuracy, as evidenced by metrics such as mean absolute error (MAE), mean squared error (MSE), Pearson coefficient (PC), and R-squared (R2). The GPR model exhibits superior predictive ability, accurately detecting and predicting even individual data points that deviate from the overall trend. The significantly lower absolute error values also consistently indicate its higher accuracy compared with the NN and RFR models. Furthermore, the GPR model shows a remarkable speedup, approximately 1.7 times faster than traditional CFD solvers, and physically captures the momentum and thermal characteristics in a surface field prediction. Finally, the target optimization is assessed using the Euclidean distance as a fitness function, ensuring the reliability of predicted data.