Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLM...Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.展开更多
Purpose: The "Norwegian model" has become widely used for assessment and resource allocation purposes. This paper investigates why this model has becomes so widespread and influential. Approach: A theoretica...Purpose: The "Norwegian model" has become widely used for assessment and resource allocation purposes. This paper investigates why this model has becomes so widespread and influential. Approach: A theoretical background is outlined in which the reduction of "uncertainty" is highlighted as a key feature of performance measurement systems. These theories are then drawn upon when revisiting previous studies of the Norwegian model, its use, and reactions to it, in Sweden.Findings: The empirical examples, which concern more formal use on the level of universities as well as responses from individual researchers, shows how particular parts—especially the "publication indicator"—are employed in Swedish academia. The discussion posits that the attractiveness of the Norwegian model largely can be explained by its ability to reduce complexity and uncertainty, even in fields where traditional bibliometric measurement is less applicable. Research limitations: The findings presented should be regarded as examples that can be used for discussion, but one should be careful to interpret these as representative for broader sentiments and trends.Implications: The sheer popularity of the Norwegian model, leading to its application in contexts for which it was not designed, can be seen as a major challenge for the future.Originality: This paper offers a novel perspective on the Norwegian model by focusing on its general "appeal", rather than on its design, use or(mis)-use.展开更多
National assessment of speech synthesis systems for Chinese has been regularly carried out since 1994 in China. New guidelines to the assessment activities which aim at promoting the assessment work to be standardizab...National assessment of speech synthesis systems for Chinese has been regularly carried out since 1994 in China. New guidelines to the assessment activities which aim at promoting the assessment work to be standardizable, automatizable (partially) and accessible to the public by computer network were set up in 1997. Two modules. the phonetic module and the linguistic module, are evaluated individually. The phonetic module is evaluated by using speech intelligibility tests at three levels:syllable, word and sentence, and speech natu-ralness tests (in MOS). As for the linguistic module, the text processing ability, which includes word segmentation, polyphonic characters, numerals, years, symbols and metrological units, is examined automatically.展开更多
A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using ...A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using speech intelligibility tests. 16 college students (8 male, 8 female) with no experience with synthetic speech were the listeners, they were asked to do open response task by pencilpaper. In addition, speech naturalness was mea-sured by Mean Opinion展开更多
This research develops a novel cross-disciplinary framework that bridges financial systemic risk modeling with supply chain network analysis to advance resilience assessment and policy guidance.The approach integrates...This research develops a novel cross-disciplinary framework that bridges financial systemic risk modeling with supply chain network analysis to advance resilience assessment and policy guidance.The approach integrates established financial contagion frameworks with the topology of the supply chain network,introducing the concept of“too central to fail”suppliers through systematic importance scoring methodologies.The framework reveals striking asymmetries in supply chain vulnerability patterns.While the majority of suppliers demonstrate systemic importance within network structures,financial fragility analysis indicates remarkable overall network robustness,with minimal nodes exhibiting high vulnerability thresholds.Most significantly,comprehensive stress testing exposes a critical paradox:networks demonstrate moderate resilience to random disruptions yet remain substantially vulnerable to strategic targeting of central nodes.Cascade failure analysis through multiple simulation approaches unveils the dual nature of supply chain risk propagation.Random shock scenarios generate manageable failure rates,while targeted attacks on high-centrality suppliers achieve disproportionate network impact.Most alarmingly,liquidity crisis simulations demonstrate how financial contagion mechanisms can affect nearly half of all network participants,highlighting the interconnected nature of operational and financial vulnerabilities.These findings establish quantitative foundations for the assessment of systemic risk in supply chains,with immediate implications for regulatory frameworks,early warning systems,and resilience enhancement strategies.The integrated financial-operational risk framework advances the theoretical understanding of the propagation of cross-sector vulnerability while providing systematic methodologies for identifying critical suppliers whose failure could trigger systemic collapse.展开更多
This paper presents a comparative analysis of the International Baccalaureate(IB)and the General Certificate of Education Advanced Level(A-Level),two international educational assessment systems.The study explores the...This paper presents a comparative analysis of the International Baccalaureate(IB)and the General Certificate of Education Advanced Level(A-Level),two international educational assessment systems.The study explores their similarities and differences in educational philosophy,curriculum design,assessment methods,and student experience.Findings indicate that the IB curriculum emphasizes holistic education and interdisciplinary learning,while the A-Level curriculum focuses more on subject depth and specialization.In terms of assessment methods,the IB combines internal and external evaluations,whereas the A-Level primarily relies on final examinations.Regarding student experience,IB students typically perceive a broader range of learning opportunities,while A-Level students gain in-depth knowledge in specific subject areas.The study offers valuable insights for educators and policymakers to improve educational practices and support the personalized development of students.展开更多
The Internet of Everything(IoE),which aims to realize information exchange and communications for anything with the Internet,has revolutionized our modern world.Serving as the driving force for devices in the IoE netw...The Internet of Everything(IoE),which aims to realize information exchange and communications for anything with the Internet,has revolutionized our modern world.Serving as the driving force for devices in the IoE network,power supply systems play a fundamental role in the development of the IoE.However,due to the complexity,multifunctionality and wide-scale deployment of diverse applications,power supply systems face great challenges,including distribution,connection,charging technologies,and management.In this review,some challenges and advances in the development of both power supply systems and their units are presented.In the overall system-level field,establishing sustainable and maintenancefree power supply systems through wireless connections,efficient power management and integrated energy harvesting and storage systems is highlighted.Additionally,the main performance metrics of power supply units are discussed,including energy density,service life,and self-power ability.In addition,some directions of power quality assessment for both the system and unit levels of power supply systems are presented,aiming to provide insight into the future development of high-performance power supply systems for the IoE.展开更多
文摘Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.
基金supported by the Swedish Foundation for the Social Sciences and Humanities(Grant No.SGO14-1153:1)
文摘Purpose: The "Norwegian model" has become widely used for assessment and resource allocation purposes. This paper investigates why this model has becomes so widespread and influential. Approach: A theoretical background is outlined in which the reduction of "uncertainty" is highlighted as a key feature of performance measurement systems. These theories are then drawn upon when revisiting previous studies of the Norwegian model, its use, and reactions to it, in Sweden.Findings: The empirical examples, which concern more formal use on the level of universities as well as responses from individual researchers, shows how particular parts—especially the "publication indicator"—are employed in Swedish academia. The discussion posits that the attractiveness of the Norwegian model largely can be explained by its ability to reduce complexity and uncertainty, even in fields where traditional bibliometric measurement is less applicable. Research limitations: The findings presented should be regarded as examples that can be used for discussion, but one should be careful to interpret these as representative for broader sentiments and trends.Implications: The sheer popularity of the Norwegian model, leading to its application in contexts for which it was not designed, can be seen as a major challenge for the future.Originality: This paper offers a novel perspective on the Norwegian model by focusing on its general "appeal", rather than on its design, use or(mis)-use.
文摘National assessment of speech synthesis systems for Chinese has been regularly carried out since 1994 in China. New guidelines to the assessment activities which aim at promoting the assessment work to be standardizable, automatizable (partially) and accessible to the public by computer network were set up in 1997. Two modules. the phonetic module and the linguistic module, are evaluated individually. The phonetic module is evaluated by using speech intelligibility tests at three levels:syllable, word and sentence, and speech natu-ralness tests (in MOS). As for the linguistic module, the text processing ability, which includes word segmentation, polyphonic characters, numerals, years, symbols and metrological units, is examined automatically.
文摘A national assessment of the performance of speech synthesis systems for Chinese has been carried out yearly since 1994. The quality of synthetic speech of five different systems were evaluated and diagnosed by using speech intelligibility tests. 16 college students (8 male, 8 female) with no experience with synthetic speech were the listeners, they were asked to do open response task by pencilpaper. In addition, speech naturalness was mea-sured by Mean Opinion
文摘This research develops a novel cross-disciplinary framework that bridges financial systemic risk modeling with supply chain network analysis to advance resilience assessment and policy guidance.The approach integrates established financial contagion frameworks with the topology of the supply chain network,introducing the concept of“too central to fail”suppliers through systematic importance scoring methodologies.The framework reveals striking asymmetries in supply chain vulnerability patterns.While the majority of suppliers demonstrate systemic importance within network structures,financial fragility analysis indicates remarkable overall network robustness,with minimal nodes exhibiting high vulnerability thresholds.Most significantly,comprehensive stress testing exposes a critical paradox:networks demonstrate moderate resilience to random disruptions yet remain substantially vulnerable to strategic targeting of central nodes.Cascade failure analysis through multiple simulation approaches unveils the dual nature of supply chain risk propagation.Random shock scenarios generate manageable failure rates,while targeted attacks on high-centrality suppliers achieve disproportionate network impact.Most alarmingly,liquidity crisis simulations demonstrate how financial contagion mechanisms can affect nearly half of all network participants,highlighting the interconnected nature of operational and financial vulnerabilities.These findings establish quantitative foundations for the assessment of systemic risk in supply chains,with immediate implications for regulatory frameworks,early warning systems,and resilience enhancement strategies.The integrated financial-operational risk framework advances the theoretical understanding of the propagation of cross-sector vulnerability while providing systematic methodologies for identifying critical suppliers whose failure could trigger systemic collapse.
文摘This paper presents a comparative analysis of the International Baccalaureate(IB)and the General Certificate of Education Advanced Level(A-Level),two international educational assessment systems.The study explores their similarities and differences in educational philosophy,curriculum design,assessment methods,and student experience.Findings indicate that the IB curriculum emphasizes holistic education and interdisciplinary learning,while the A-Level curriculum focuses more on subject depth and specialization.In terms of assessment methods,the IB combines internal and external evaluations,whereas the A-Level primarily relies on final examinations.Regarding student experience,IB students typically perceive a broader range of learning opportunities,while A-Level students gain in-depth knowledge in specific subject areas.The study offers valuable insights for educators and policymakers to improve educational practices and support the personalized development of students.
基金National Youth Talent Support ProgramNational Natural Science Foundation of China and Guangdong Province,Grant/Award Number:U1601216+2 种基金Tianjin Natural Science Foundation,Grant/Award Number:18JCJQJC46500National Natural Science Foundation of China,Grant/Award Number:51771134National Science Foundation for Excellent Young Scholar,Grant/Award Number:51722403。
文摘The Internet of Everything(IoE),which aims to realize information exchange and communications for anything with the Internet,has revolutionized our modern world.Serving as the driving force for devices in the IoE network,power supply systems play a fundamental role in the development of the IoE.However,due to the complexity,multifunctionality and wide-scale deployment of diverse applications,power supply systems face great challenges,including distribution,connection,charging technologies,and management.In this review,some challenges and advances in the development of both power supply systems and their units are presented.In the overall system-level field,establishing sustainable and maintenancefree power supply systems through wireless connections,efficient power management and integrated energy harvesting and storage systems is highlighted.Additionally,the main performance metrics of power supply units are discussed,including energy density,service life,and self-power ability.In addition,some directions of power quality assessment for both the system and unit levels of power supply systems are presented,aiming to provide insight into the future development of high-performance power supply systems for the IoE.