In recent decades,control performance monitoring(CPM)has experienced remarkable progress in research and industrial applications.While CPM research has been investigated using various benchmarks,the historical data be...In recent decades,control performance monitoring(CPM)has experienced remarkable progress in research and industrial applications.While CPM research has been investigated using various benchmarks,the historical data benchmark(HIS)has garnered the most attention due to its practicality and effectiveness.However,existing CPM reviews usually focus on the theoretical benchmark,and there is a lack of an in-depth review that thoroughly explores HIS-based methods.In this article,a comprehensive overview of HIS-based CPM is provided.First,we provide a novel static-dynamic perspective on data-level manifestations of control performance underlying typical controller capacities including regulation and servo:static and dynamic properties.The static property portrays time-independent variability in system output,and the dynamic property describes temporal behavior driven by closed-loop feedback.Accordingly,existing HIS-based CPM approaches and their intrinsic motivations are classified and analyzed from these two perspectives.Specifically,two mainstream solutions for CPM methods are summarized,including static analysis and dynamic analysis,which match data-driven techniques with actual controlling behavior.Furthermore,this paper also points out various opportunities and challenges faced in CPM for modern industry and provides promising directions in the context of artificial intelligence for inspiring future research.展开更多
Research suggests that transient institutions, i.e., institutions with short-term investment horizon,make management focus on short-term earnings goals. This study examines incentive in terms of CEO cash compensation ...Research suggests that transient institutions, i.e., institutions with short-term investment horizon,make management focus on short-term earnings goals. This study examines incentive in terms of CEO cash compensation that explains why management concentrates on short-term earnings results when transient institutions hold high levels of ownership. Using quarterly consensus analysts' expectations as a proxy for short-term earnings benchmarks, the author finds that CEO cash compensation and the frequency with which management misses quarterly earnings benchmarks in a year (MISSNUMt) are more strongly negatively associated in firms with high transient institutional ownership than in firms with low transient institutional ownership, suggesting that transient institutions strengthen the inverse relation between CEO cash pay and missing short-term earnings benchmarks and hence increase pressure on management in terms of cash pay for short-term results. Moreover, the author shows that change in CEO cash compensation is positively associated with change in transient institutional ownership, consistent with the idea that selling shares by transient institutions influences the boards of portfolio firms in CEO cash compensation decision. This study contributes to the governance literature and is relevant to business managers by providing additional evidence that transient institutions provide less patient capital and may not benefit long-run firm value creation.展开更多
With the adoption of the Luanda Declaration at the end of the conference,it was fairly evident that African governments could no longer deny the causal links or intersection-between the environment and health care for...With the adoption of the Luanda Declaration at the end of the conference,it was fairly evident that African governments could no longer deny the causal links or intersection-between the environment and health care for people across the continent.展开更多
Soil classification is the foundation for exchange and extension of research findings in soil science and for modern management of soil resources. This study explained database and research methodology to create a cro...Soil classification is the foundation for exchange and extension of research findings in soil science and for modern management of soil resources. This study explained database and research methodology to create a cross-reference system for translating the Genetic Soil Classification of China (GSCC) into the Chinese Soil Taxonomy (CST). With the help of the CST keys, each of the 2 540 soil species in GSCC has been interpreted to its corresponding soil order, suborder, great group, and sub-group in CST. According to the methodology adopted, the assigned soil species have been linked one another to their corresponding polygons in the 1:1000000 digital soil map of China. Referencibility of each soil species between the GSCC and CST systems was determined statistically on the basis of distribution area of each soil species at a high taxon level of the two systems. The soils were then sorted according to their maximum referencibility and classified into three categories for discussion. There were 19 soil great groups in GSCC with maximum referencibility > 90% and 22 great groups between 60%-90%. These soil great groups could serve as cross-reference benchmarks. There were 19 great groups in GSCC with maximum referencibility < 60%, which could be used as cross-reference benchmarks until new and better results were available. For these soils, if the translation was made at a lower soil taxon level or on a regional basis, it would improve their referencibility enabling them to serve as new cross-reference benchmarks.展开更多
There exists a gap between control theory and control practice,i.e.,all control methods suggested by researchers are not implemented in real systems and,on the other hand,many important in dustrial problems are not st...There exists a gap between control theory and control practice,i.e.,all control methods suggested by researchers are not implemented in real systems and,on the other hand,many important in dustrial problems are not studied in the academic research.Benchmark problems can help close this gap and provide many opportunities for members in both the controls theory and application communities.The goal is to survey and give pointers to different general controls and modeling related benchmark problems that can serve as inspiration for future benchmarks and then specifically focus the benchmark coverage on automotive control engineering application.In the paper reflections are given on how different categories of benchmark designers,benchmark solvers and third part users can benefit from providing,solving,and studying benchmark problems.The paper also collects information about several benchmark problems and gives pointers to papers than give more detailed information about different problems that have been presented.展开更多
Helicopter EMS (HEMS) allows for patients to be quickly transported into regional cardiac centers, often to receive primary percutaneous coronary intervention (PCI). Since PCI is a time-critical therapy, it is importa...Helicopter EMS (HEMS) allows for patients to be quickly transported into regional cardiac centers, often to receive primary percutaneous coronary intervention (PCI). Since PCI is a time-critical therapy, it is important that patients get to primary PCI as quickly as possible. HEMS crews’ “on-scene” times for trauma patients have been extensively studied, and recent years have seen many efforts to minimize the time required to prepare patients for transport. There has been less attention to interfacility transport “scene times” for HEMS crews at referring hospitals;this includes stabilization times for preparing cardiac patients for loading onto aircraft for HEMS transport to primary PCI. In the absence of guiding evidence, system benchmarking and quality improvement are difficult. Therefore the current study was undertaken, to assess and describe the HEMS crew “on-scene” times or “patient stabilization times” (PSTs) at referring hospitals, for interfacility transported cardiac patients flown for primary PCI. Descriptive analysis identified a PST median of 19 minutes (interquartile range 15 - 24), and univariate analyses using Kruskal-Wallis testing found no association between prolonged PST and sending unit type (Emergency Department versus other), off-hours transports, or relatively frequent (at least monthly) use of HEMS (p for all comparisons > 0.64). Outlier PSTs, defined a priori as those exceeding the median by at least a half-hour, were found in 12% of all cases. These data could be useful as a starting point for system planning and benchmarking efforts in regionalized systems of acute cardiac care.展开更多
Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medic...Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medical practice,a rigorous and systematic evaluation of their medical competence is imperative.This study presents a comprehensive review of the established methodologies and benchmarks for evaluating the medical competence of LLMs,encompassing a thorough analysis of current assessment practices across medical knowledge,clinical practice competence,and ethical-safety considerations.By integrating clinician competency assessment frameworks into LLMs evaluation,we propose a structured tri-dimensional framework that systematically organizes existing evaluation approaches according to medical theoretical knowledge,clinical practice ability,and ethical-safety considerations.Furthermore,this research provides critical insights into future developmental trajectories while establishing foundational frameworks and standardization protocols for the integration of LLMs into medical practice.展开更多
Piráis a reading comprehension dataset focused on the ocean,the Brazilian coast,and climate change,built from a collection of scientific abstracts and reports on these topics.This dataset represents a versatile l...Piráis a reading comprehension dataset focused on the ocean,the Brazilian coast,and climate change,built from a collection of scientific abstracts and reports on these topics.This dataset represents a versatile language resource,particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge.Despite its potential,a detailed set of baselines has not yet been developed for Pirá.By creating these baselines,researchers can more easily utilize Piráas a resource for testing machine learning models across a wide range of question answering tasks.In this paper,we define six benchmarks over the Pirádataset,covering closed generative question answering,machine reading comprehension,information retrieval,open question answering,answer triggering,and multiple choice question answering.As part of this effort,we have also produced a curated version of the original dataset,where we fixed a number of grammar issues,repetitions,and other shortcomings.Furthermore,the dataset has been extended in several new directions,so as to face the aforementioned benchmarks:translation of supporting texts from English into Portuguese,classification labels for answerability,automatic paraphrases of questions and answers,and multiple choice candidates.The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pirádataset.展开更多
Automatic modal identification via automatically interpreting the stabilization diagram provides key technique in bridge structural health monitoring.This paper reviews the progress in the area of automatic modal iden...Automatic modal identification via automatically interpreting the stabilization diagram provides key technique in bridge structural health monitoring.This paper reviews the progress in the area of automatic modal identification based on interpreting the stabilization diagram.The whole identification process is divided into four steps from establishing the stabilization diagram to removing the outliers in the identification results.The criteria and algorithms used in each step in the existing studies are carefully summarized and classified.Comparisons between typical methods in cleaning and interpreting the stabilization diagram are also conducted.Real structure benchmarks used in the existing studies to validate the proposed automatic modal identification methods are also summarized.Based on the review and comparison,the specific ratio method for cleaning the stabilization diagram,the hierarchical clustering method for interpreting the stabilization diagram and the adjusted boxplot for removing the outliers in the identification results are the most suitable methods for each step.The key point of automatic modal identification based on interpreting the stabilization diagram has also discussed,and it is recommended to pay more attention to cleaning the stabilization diagram.Future study about automatic modal identification under situation with very few sensors deployed should be more concerned.This review aims to help researchers and practitioners in implementing existing automatic modal identification algorithms effectively and developing more suitable and practical methods for civil engineering structures in the future.展开更多
Purpose-Prominent at the intersections of national educational agencies,higher education,and international educational performance assessments are two reform standards:“benchmarks”determining optimal student perform...Purpose-Prominent at the intersections of national educational agencies,higher education,and international educational performance assessments are two reform standards:“benchmarks”determining optimal student performance,and“empirical evidence”for determining the quality of reform practices.These two notions are often taken as connecting policy and research to effective changes in many countries.The article examines the historical and cultural principles about educational change and its sciences embedded in these standards through examining OECD’s PISA and the McKinsey&Company reports that draw on PISA’s data.Findings/Originality/Value-First,the reports express salvation themes associated with modernity;that is,the promise of a better future through governing the present.The promise is to provide nations with data and models to achieve social equality,economic prosperity,and a participatory democracy.Second,the promise of the future is not descriptive of some present reality but to fabricate the universal characteristics about society and individuals.The numbers embody social and psychological categories about a desired unity of all students.Third,the“empirical evidence”of the international assessment entails a particular notion of science and“evidence”;one that paradoxically uses the universals in comparing and creating divisions.展开更多
As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processin...As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processing and enabled significant improvements in various applications.This document seeks to investigate the security vulnerabilities detection in the source code using a range of large language models(LLM).Our primary objective is to evaluate the effectiveness of Static Application Security Testing(SAST)by applying various techniques such as prompt persona,structure outputs and zero-shot.To the selection of the LLMs(CodeLlama 7B,DeepSeek coder 7B,Gemini 1.5 Flash,Gemini 2.0 Flash,Mistral 7b Instruct,Phi 38b Mini 128K instruct,Qwen 2.5 coder,StartCoder 27B)with comparison and combination with Find Security Bugs.The evaluation method will involve using a selected dataset containing vulnerabilities,and the results to provide insights for different scenarios according to the software criticality(Business critical,non-critical,minimum effort,best effort)In detail,the main objectives of this study are to investigate if large language models outperform or exceed the capabilities of traditional static analysis tools,if the combining LLMs with Static Application Security Testing(SAST)tools lead to an improvement and the possibility that local machine learning models on a normal computer produce reliable results.Summarizing the most important conclusions of the research,it can be said that while it is true that the results have improved depending on the size of the LLM for business-critical software,the best results have been obtained by SAST analysis.This differs in“NonCritical,”“Best Effort,”and“Minimum Effort”scenarios,where the combination of LLM(Gemini)+SAST has obtained better results.展开更多
Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world propriet...Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modem storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.展开更多
This research presents a novel nature-inspired metaheuristic optimization algorithm,called theNarwhale Optimization Algorithm(NWOA).The algorithm draws inspiration from the foraging and prey-hunting strategies of narw...This research presents a novel nature-inspired metaheuristic optimization algorithm,called theNarwhale Optimization Algorithm(NWOA).The algorithm draws inspiration from the foraging and prey-hunting strategies of narwhals,“unicorns of the sea”,particularly the use of their distinctive spiral tusks,which play significant roles in hunting,searching prey,navigation,echolocation,and complex social interaction.Particularly,the NWOA imitates the foraging strategies and techniques of narwhals when hunting for prey but focuses mainly on the cooperative and exploratory behavior shown during group hunting and in the use of their tusks in sensing and locating prey under the Arctic ice.These functions provide a strong assessment basis for investigating the algorithm’s prowess at balancing exploration and exploitation,convergence speed,and solution accuracy.The performance of the NWOA is evaluated on 30 benchmark test functions.A comparison study using the Grey Wolf Optimizer(GWO),Whale Optimization Algorithm(WOA),Perfumer Optimization Algorithm(POA),Candle Flame Optimization(CFO)Algorithm,Particle Swarm Optimization(PSO)Algorithm,and Genetic Algorithm(GA)validates the results.As evidenced in the experimental results,NWOA is capable of yielding competitive outcomes among these well-known optimizers,whereas in several instances.These results suggest thatNWOAhas proven to be an effective and robust optimization tool suitable for solving many different complex optimization problems from the real world.展开更多
The development of chemical technologies,which involves a multistage process covering laboratory research,scale‐up to industrial deployment,and necessitates interdisciplinary collaboration,is often accompanied by sub...The development of chemical technologies,which involves a multistage process covering laboratory research,scale‐up to industrial deployment,and necessitates interdisciplinary collaboration,is often accompanied by substantial time and economic costs.To address these challenges,in this work,we report ChemELLM,a domain‐specific large language model(LLM)with 70 billion parameters for chemical engineering.ChemELLM demonstrates state‐of‐the‐art performance across critical tasks ranging from foundational understanding to professional problem‐solving.It outperforms mainstream LLMs(e.g.,O1‐Preview,GPT‐4o,and DeepSeek‐R1)on ChemEBench,the first multidimensional benchmark for chemical engineering,which encompasses 15 dimensions across 101 distinct essential tasks.To support robust model development,we curated ChemEData,a purpose‐built dataset containing 19 billion tokens for pre‐training and 1 billion tokens for fine‐tuning.This work establishes a new paradigm for artificial intelligence‐driven innovation,bridging the gap between laboratory‐scale innovation and industrial‐scale implementation,thus accelerating technological advancement in chemical engineering.ChemELLM is publicly available at https://chemindustry.iflytek.com/chat.展开更多
This study employs density functional theory(DFT)calculations to systematically investigate the B‒H bond dissociation enthalpies(BDEs)of Lewis base‒borane complexes.A rigorous benchmark analysis identified theωB97XD/...This study employs density functional theory(DFT)calculations to systematically investigate the B‒H bond dissociation enthalpies(BDEs)of Lewis base‒borane complexes.A rigorous benchmark analysis identified theωB97XD/cc-pVTZ method as a reliable method for accurate prediction of B–H BDEs.An examination of more than 200 structurally diverse complexes across five major classes revealed that the type of Lewis base significantly influences the BDEs,with the order of amine–borane>phosphine–borane>N-heterocyclic carbene–borane>pyridine–borane.Solventstabilized boranes exhibit the broadest range of BDE values due to the diverse coordination modes of solvent molecules with borane.Further analysis revealed that the BDE values are synergistically affected by skeletal and substituent effects.Notably,a strong linear correlation(R^(2) up to 0.97)between the spin density of boryl radicals and BDEs,except for amine–boranes,provides a robust predictive model.This research enhances the fundamental understanding of B‒H bond dissociation properties in Lewis base–boranes and provides valuable insights for the development of new boron-based methodologies in organic synthesis.展开更多
基金supported in part by the National Natural Science Foundation of China(62125306)Zhejiang Key Research and Development Project(2024C01163)the State Key Laboratory of Industrial Control Technology,China(ICT2024A06)
文摘In recent decades,control performance monitoring(CPM)has experienced remarkable progress in research and industrial applications.While CPM research has been investigated using various benchmarks,the historical data benchmark(HIS)has garnered the most attention due to its practicality and effectiveness.However,existing CPM reviews usually focus on the theoretical benchmark,and there is a lack of an in-depth review that thoroughly explores HIS-based methods.In this article,a comprehensive overview of HIS-based CPM is provided.First,we provide a novel static-dynamic perspective on data-level manifestations of control performance underlying typical controller capacities including regulation and servo:static and dynamic properties.The static property portrays time-independent variability in system output,and the dynamic property describes temporal behavior driven by closed-loop feedback.Accordingly,existing HIS-based CPM approaches and their intrinsic motivations are classified and analyzed from these two perspectives.Specifically,two mainstream solutions for CPM methods are summarized,including static analysis and dynamic analysis,which match data-driven techniques with actual controlling behavior.Furthermore,this paper also points out various opportunities and challenges faced in CPM for modern industry and provides promising directions in the context of artificial intelligence for inspiring future research.
文摘Research suggests that transient institutions, i.e., institutions with short-term investment horizon,make management focus on short-term earnings goals. This study examines incentive in terms of CEO cash compensation that explains why management concentrates on short-term earnings results when transient institutions hold high levels of ownership. Using quarterly consensus analysts' expectations as a proxy for short-term earnings benchmarks, the author finds that CEO cash compensation and the frequency with which management misses quarterly earnings benchmarks in a year (MISSNUMt) are more strongly negatively associated in firms with high transient institutional ownership than in firms with low transient institutional ownership, suggesting that transient institutions strengthen the inverse relation between CEO cash pay and missing short-term earnings benchmarks and hence increase pressure on management in terms of cash pay for short-term results. Moreover, the author shows that change in CEO cash compensation is positively associated with change in transient institutional ownership, consistent with the idea that selling shares by transient institutions influences the boards of portfolio firms in CEO cash compensation decision. This study contributes to the governance literature and is relevant to business managers by providing additional evidence that transient institutions provide less patient capital and may not benefit long-run firm value creation.
文摘With the adoption of the Luanda Declaration at the end of the conference,it was fairly evident that African governments could no longer deny the causal links or intersection-between the environment and health care for people across the continent.
基金Project supported by the National Natural Science Foundation of China (No. 40471081)the Frontal Field Project of the Chinese Academy of Sciences (No. ISSASIP0201) the Key Innovation Project of Chinese Academy of Sciences (No.KZCX3-SW-427).
文摘Soil classification is the foundation for exchange and extension of research findings in soil science and for modern management of soil resources. This study explained database and research methodology to create a cross-reference system for translating the Genetic Soil Classification of China (GSCC) into the Chinese Soil Taxonomy (CST). With the help of the CST keys, each of the 2 540 soil species in GSCC has been interpreted to its corresponding soil order, suborder, great group, and sub-group in CST. According to the methodology adopted, the assigned soil species have been linked one another to their corresponding polygons in the 1:1000000 digital soil map of China. Referencibility of each soil species between the GSCC and CST systems was determined statistically on the basis of distribution area of each soil species at a high taxon level of the two systems. The soils were then sorted according to their maximum referencibility and classified into three categories for discussion. There were 19 soil great groups in GSCC with maximum referencibility > 90% and 22 great groups between 60%-90%. These soil great groups could serve as cross-reference benchmarks. There were 19 great groups in GSCC with maximum referencibility < 60%, which could be used as cross-reference benchmarks until new and better results were available. For these soils, if the translation was made at a lower soil taxon level or on a regional basis, it would improve their referencibility enabling them to serve as new cross-reference benchmarks.
文摘There exists a gap between control theory and control practice,i.e.,all control methods suggested by researchers are not implemented in real systems and,on the other hand,many important in dustrial problems are not studied in the academic research.Benchmark problems can help close this gap and provide many opportunities for members in both the controls theory and application communities.The goal is to survey and give pointers to different general controls and modeling related benchmark problems that can serve as inspiration for future benchmarks and then specifically focus the benchmark coverage on automotive control engineering application.In the paper reflections are given on how different categories of benchmark designers,benchmark solvers and third part users can benefit from providing,solving,and studying benchmark problems.The paper also collects information about several benchmark problems and gives pointers to papers than give more detailed information about different problems that have been presented.
文摘Helicopter EMS (HEMS) allows for patients to be quickly transported into regional cardiac centers, often to receive primary percutaneous coronary intervention (PCI). Since PCI is a time-critical therapy, it is important that patients get to primary PCI as quickly as possible. HEMS crews’ “on-scene” times for trauma patients have been extensively studied, and recent years have seen many efforts to minimize the time required to prepare patients for transport. There has been less attention to interfacility transport “scene times” for HEMS crews at referring hospitals;this includes stabilization times for preparing cardiac patients for loading onto aircraft for HEMS transport to primary PCI. In the absence of guiding evidence, system benchmarking and quality improvement are difficult. Therefore the current study was undertaken, to assess and describe the HEMS crew “on-scene” times or “patient stabilization times” (PSTs) at referring hospitals, for interfacility transported cardiac patients flown for primary PCI. Descriptive analysis identified a PST median of 19 minutes (interquartile range 15 - 24), and univariate analyses using Kruskal-Wallis testing found no association between prolonged PST and sending unit type (Emergency Department versus other), off-hours transports, or relatively frequent (at least monthly) use of HEMS (p for all comparisons > 0.64). Outlier PSTs, defined a priori as those exceeding the median by at least a half-hour, were found in 12% of all cases. These data could be useful as a starting point for system planning and benchmarking efforts in regionalized systems of acute cardiac care.
基金Guangzhou Science and Technology Program,Grant/Award Numbers:2025B03J0110,2024A03J1074,2024A03J0927。
文摘Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medical practice,a rigorous and systematic evaluation of their medical competence is imperative.This study presents a comprehensive review of the established methodologies and benchmarks for evaluating the medical competence of LLMs,encompassing a thorough analysis of current assessment practices across medical knowledge,clinical practice competence,and ethical-safety considerations.By integrating clinician competency assessment frameworks into LLMs evaluation,we propose a structured tri-dimensional framework that systematically organizes existing evaluation approaches according to medical theoretical knowledge,clinical practice ability,and ethical-safety considerations.Furthermore,this research provides critical insights into future developmental trajectories while establishing foundational frameworks and standardization protocols for the integration of LLMs into medical practice.
基金The work was carried out at the Center for Artificial Intelligence(C4AI-USP)with support from the São Paulo Research Foundation(FAPESP grant#2019/07665-4)from the IBM Corporation.This research was also partially supported by ItaúUnibanco S.A.+1 种基金M.M.Joséand F.Nakasato have been supported by the ItaúScholarship Program(PBI)of the Data Science Center(C2D)of the Escola Politécnica da Universidade de São PauloWe acknowledge support by CAPES-Finance Code 001.A.H.R.Costa and F.G.Cozman were partially supported by CNPq grants 310085/2020-9 and 305753/2022-3 respectively.Paulo Pirozelli was supported by the FAPESP grant 2019/26762-0.
文摘Piráis a reading comprehension dataset focused on the ocean,the Brazilian coast,and climate change,built from a collection of scientific abstracts and reports on these topics.This dataset represents a versatile language resource,particularly useful for testing the ability of current machine learning models to acquire expert scientific knowledge.Despite its potential,a detailed set of baselines has not yet been developed for Pirá.By creating these baselines,researchers can more easily utilize Piráas a resource for testing machine learning models across a wide range of question answering tasks.In this paper,we define six benchmarks over the Pirádataset,covering closed generative question answering,machine reading comprehension,information retrieval,open question answering,answer triggering,and multiple choice question answering.As part of this effort,we have also produced a curated version of the original dataset,where we fixed a number of grammar issues,repetitions,and other shortcomings.Furthermore,the dataset has been extended in several new directions,so as to face the aforementioned benchmarks:translation of supporting texts from English into Portuguese,classification labels for answerability,automatic paraphrases of questions and answers,and multiple choice candidates.The results described in this paper provide several points of reference for researchers interested in exploring the challenges provided by the Pirádataset.
基金supported by National Key R&D Program of China(No.2019YFB1600702)the National Natural Science Foundation of China(No.51878059).
文摘Automatic modal identification via automatically interpreting the stabilization diagram provides key technique in bridge structural health monitoring.This paper reviews the progress in the area of automatic modal identification based on interpreting the stabilization diagram.The whole identification process is divided into four steps from establishing the stabilization diagram to removing the outliers in the identification results.The criteria and algorithms used in each step in the existing studies are carefully summarized and classified.Comparisons between typical methods in cleaning and interpreting the stabilization diagram are also conducted.Real structure benchmarks used in the existing studies to validate the proposed automatic modal identification methods are also summarized.Based on the review and comparison,the specific ratio method for cleaning the stabilization diagram,the hierarchical clustering method for interpreting the stabilization diagram and the adjusted boxplot for removing the outliers in the identification results are the most suitable methods for each step.The key point of automatic modal identification based on interpreting the stabilization diagram has also discussed,and it is recommended to pay more attention to cleaning the stabilization diagram.Future study about automatic modal identification under situation with very few sensors deployed should be more concerned.This review aims to help researchers and practitioners in implementing existing automatic modal identification algorithms effectively and developing more suitable and practical methods for civil engineering structures in the future.
文摘Purpose-Prominent at the intersections of national educational agencies,higher education,and international educational performance assessments are two reform standards:“benchmarks”determining optimal student performance,and“empirical evidence”for determining the quality of reform practices.These two notions are often taken as connecting policy and research to effective changes in many countries.The article examines the historical and cultural principles about educational change and its sciences embedded in these standards through examining OECD’s PISA and the McKinsey&Company reports that draw on PISA’s data.Findings/Originality/Value-First,the reports express salvation themes associated with modernity;that is,the promise of a better future through governing the present.The promise is to provide nations with data and models to achieve social equality,economic prosperity,and a participatory democracy.Second,the promise of the future is not descriptive of some present reality but to fabricate the universal characteristics about society and individuals.The numbers embody social and psychological categories about a desired unity of all students.Third,the“empirical evidence”of the international assessment entails a particular notion of science and“evidence”;one that paradoxically uses the universals in comparing and creating divisions.
文摘As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processing and enabled significant improvements in various applications.This document seeks to investigate the security vulnerabilities detection in the source code using a range of large language models(LLM).Our primary objective is to evaluate the effectiveness of Static Application Security Testing(SAST)by applying various techniques such as prompt persona,structure outputs and zero-shot.To the selection of the LLMs(CodeLlama 7B,DeepSeek coder 7B,Gemini 1.5 Flash,Gemini 2.0 Flash,Mistral 7b Instruct,Phi 38b Mini 128K instruct,Qwen 2.5 coder,StartCoder 27B)with comparison and combination with Find Security Bugs.The evaluation method will involve using a selected dataset containing vulnerabilities,and the results to provide insights for different scenarios according to the software criticality(Business critical,non-critical,minimum effort,best effort)In detail,the main objectives of this study are to investigate if large language models outperform or exceed the capabilities of traditional static analysis tools,if the combining LLMs with Static Application Security Testing(SAST)tools lead to an improvement and the possibility that local machine learning models on a normal computer produce reliable results.Summarizing the most important conclusions of the research,it can be said that while it is true that the results have improved depending on the size of the LLM for business-critical software,the best results have been obtained by SAST analysis.This differs in“NonCritical,”“Best Effort,”and“Minimum Effort”scenarios,where the combination of LLM(Gemini)+SAST has obtained better results.
基金Project supported by the National Natural Science Foundation of China (Nos. 61572394 and 61272098), the Shenzhen Funda mental Research Plan (Nos. JCYJ20120615101127404 and JSGG20140519141854753), and thc National Kcy Technologies R&D Program of China (No. 2011BAH04B03)
文摘Modem storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for modem storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen is O(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.
文摘This research presents a novel nature-inspired metaheuristic optimization algorithm,called theNarwhale Optimization Algorithm(NWOA).The algorithm draws inspiration from the foraging and prey-hunting strategies of narwhals,“unicorns of the sea”,particularly the use of their distinctive spiral tusks,which play significant roles in hunting,searching prey,navigation,echolocation,and complex social interaction.Particularly,the NWOA imitates the foraging strategies and techniques of narwhals when hunting for prey but focuses mainly on the cooperative and exploratory behavior shown during group hunting and in the use of their tusks in sensing and locating prey under the Arctic ice.These functions provide a strong assessment basis for investigating the algorithm’s prowess at balancing exploration and exploitation,convergence speed,and solution accuracy.The performance of the NWOA is evaluated on 30 benchmark test functions.A comparison study using the Grey Wolf Optimizer(GWO),Whale Optimization Algorithm(WOA),Perfumer Optimization Algorithm(POA),Candle Flame Optimization(CFO)Algorithm,Particle Swarm Optimization(PSO)Algorithm,and Genetic Algorithm(GA)validates the results.As evidenced in the experimental results,NWOA is capable of yielding competitive outcomes among these well-known optimizers,whereas in several instances.These results suggest thatNWOAhas proven to be an effective and robust optimization tool suitable for solving many different complex optimization problems from the real world.
文摘The development of chemical technologies,which involves a multistage process covering laboratory research,scale‐up to industrial deployment,and necessitates interdisciplinary collaboration,is often accompanied by substantial time and economic costs.To address these challenges,in this work,we report ChemELLM,a domain‐specific large language model(LLM)with 70 billion parameters for chemical engineering.ChemELLM demonstrates state‐of‐the‐art performance across critical tasks ranging from foundational understanding to professional problem‐solving.It outperforms mainstream LLMs(e.g.,O1‐Preview,GPT‐4o,and DeepSeek‐R1)on ChemEBench,the first multidimensional benchmark for chemical engineering,which encompasses 15 dimensions across 101 distinct essential tasks.To support robust model development,we curated ChemEData,a purpose‐built dataset containing 19 billion tokens for pre‐training and 1 billion tokens for fine‐tuning.This work establishes a new paradigm for artificial intelligence‐driven innovation,bridging the gap between laboratory‐scale innovation and industrial‐scale implementation,thus accelerating technological advancement in chemical engineering.ChemELLM is publicly available at https://chemindustry.iflytek.com/chat.
基金supported by the USTC Research Funds of the Double First-Class Initiative(YD2060006004,YD2060002027)the National Natural Science Foundation of China(22325107,22171253,22293011).
文摘This study employs density functional theory(DFT)calculations to systematically investigate the B‒H bond dissociation enthalpies(BDEs)of Lewis base‒borane complexes.A rigorous benchmark analysis identified theωB97XD/cc-pVTZ method as a reliable method for accurate prediction of B–H BDEs.An examination of more than 200 structurally diverse complexes across five major classes revealed that the type of Lewis base significantly influences the BDEs,with the order of amine–borane>phosphine–borane>N-heterocyclic carbene–borane>pyridine–borane.Solventstabilized boranes exhibit the broadest range of BDE values due to the diverse coordination modes of solvent molecules with borane.Further analysis revealed that the BDE values are synergistically affected by skeletal and substituent effects.Notably,a strong linear correlation(R^(2) up to 0.97)between the spin density of boryl radicals and BDEs,except for amine–boranes,provides a robust predictive model.This research enhances the fundamental understanding of B‒H bond dissociation properties in Lewis base–boranes and provides valuable insights for the development of new boron-based methodologies in organic synthesis.