期刊文献+
共找到595篇文章
< 1 2 30 >
每页显示 20 50 100
Systematic Benchmarking of Topology Optimization Methods Using Both Binary and Relaxed Forms of the Zhou-Rozvany Problem
1
作者 Jiye Zhou Yun-Fei Fu Kazem Ghabraie 《Computer Modeling in Engineering & Sciences》 2025年第6期3233-3251,共19页
Most material distribution-based topology optimization methods work on a relaxed form of the optimization problem and then push the solution toward the binary limits.However,when benchmarking these methods,researchers... Most material distribution-based topology optimization methods work on a relaxed form of the optimization problem and then push the solution toward the binary limits.However,when benchmarking these methods,researchers use known solutions to only a single form of benchmark problem.This paper proposes a comparison platform for systematic benchmarking of topology optimization methods using both binary and relaxed forms.A greyness measure is implemented to evaluate how far a solution is from the desired binary form.The well-known ZhouRozvany(ZR)problem is selected as the benchmarking problem here,making use of available global solutions for both its relaxed and binary forms.The recently developed non-penalization Smooth-edged Material Distribution for Optimizing Topology(SEMDOT),well-established Solid Isotropic Material with Penalization(SIMP),and continuation methods are studied on this platform.Interestingly,in most cases,the grayscale solutions obtained by SEMDOT demonstrate better performance in dealing with the ZR problem than SIMP.The reasons are investigated and attributed to the usage of two different regularization techniques,namely,the Heaviside smooth function in SEMDOT and the power-law penalty in SIMP.More importantly,a simple-to-use benchmarking graph is proposed for evaluating newly developed topology optimization methods. 展开更多
关键词 Topology optimization Zhou-Rozvany problem benchmarking binary forms relaxed forms power-law penalty heaviside smooth function
在线阅读 下载PDF
Deep Learning for Video Summarization:Systematic Review,Challenges and Opportunities
2
作者 Qinghao Yu Zidong Wang +1 位作者 Guoliang Wei Hui Yu 《IEEE/CAA Journal of Automatica Sinica》 2026年第1期21-42,共22页
The exponential growth of video content has driven significant advancements in video summarization techniques in recent years.Breakthroughs in deep learning have been particularly transformative,enabling more effectiv... The exponential growth of video content has driven significant advancements in video summarization techniques in recent years.Breakthroughs in deep learning have been particularly transformative,enabling more effective detection of key information and creating new possibilities for video synopsis.To summarize recent progress and accelerate research in this field,this paper provides a comprehensive review of deep learning-based video summarization methods developed over the past decade.We begin by examining the research landscape of video abstraction technologies and identifying core challenges in video summarization.Subsequently,we systematically analyze prevailing deep learning frameworks and methodologies employed in current video summarization systems,offering researchers a clear roadmap of the field's evelution.Unlike previous review works,we first classify research papers based on the structural hierarchy of the video(from frame-level to shot-level to video-level),then further categorize them according to the summary backbone model(feature extraction and spatiotemporal modeling).This approach provides a more systematic and hierarchical organization of the documents.Following this comprehensive review,we summarize the benchmark datasets and evaluation metrics commonly employed in the field.Finally,we analyze persistent challenges and propose insightful directions for future research,providing a forward-looking perspective on video summarization technologies.This systematic literature review is of great reference value to new researchers exploring the fields of deep learning and video summarization. 展开更多
关键词 Benchmark datasets deep learning evaluation protocols video abstraction video summarization video synopsis
在线阅读 下载PDF
Transforming Healthcare with State-of-the-Art Medical-LLMs:A Comprehensive Evaluation of Current Advances Using Benchmarking Framework
3
作者 Himadri Nath Saha Dipanwita Chakraborty Bhattacharya +5 位作者 Sancharita Dutta Arnab Bera Srutorshi Basuray Satyasaran Changdar Saptarshi Banerjee Jon Turdiev 《Computers, Materials & Continua》 2026年第2期234-289,共56页
The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decis... The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature. 展开更多
关键词 Medical large language models(Med-LLM) AI in healthcare natural language processing(NLP)in medicine fine-tuning medical LLMs retrieval-augmented generation(RAG)in medicine multi-modal learning in healthcare explainability and transparency in medical AI FDA regulations for AI in medicine evaluation and benchmarking of medical large language models
在线阅读 下载PDF
MedBench:A Comprehensive,Standardized,and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models 被引量:7
4
作者 Mianxin Liu Weiguo Hu +16 位作者 Jinru Ding Jie Xu Xiaoyang Li Lifeng Zhu Zhian Bai Xiaoming Shi Benyou Wang Haitao Song Pengfei Liu Xiaofan Zhang Shanshan Wang Kang Li Haofen Wang Tong Ruan Xuanjing Huang Xin Sun Shaoting Zhang 《Big Data Mining and Analytics》 CSCD 2024年第4期1116-1128,共13页
Ensuring the general efficacy and benefit for human beings from medical Large Language Models(LLM)before real-world deployment is crucial.However,a widely accepted and accessible evaluation process for medical LLM,esp... Ensuring the general efficacy and benefit for human beings from medical Large Language Models(LLM)before real-world deployment is crucial.However,a widely accepted and accessible evaluation process for medical LLM,especially in the Chinese context,remains to be established.In this work,we introduce“MedBench”,a comprehensive,standardized,and reliable benchmarking system for Chinese medical LLM.First,MedBench assembles the currently largest evaluation dataset(300901 questions)to cover 43 clinical specialties,and performs multi-faceted evaluation on medical LLM.Second,MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure,with physical separations between question and ground truth.Third,MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer memorization.Applying MedBench to popular general and medical LLMs,we observe unbiased,reproducible evaluation results largely aligning with medical professionals’perspectives.This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. 展开更多
关键词 Medical Large Language Model(MLLM) BENCHMARK platform OPEN-SOURCE
原文传递
Satellite Derived Geospatial Irrigation Performance Indicators for Benchmarking Studies of Irrigation Systems 被引量:1
5
作者 A. V. Suresh Babu M. Shanker V. Venkateshwar Rao 《Advances in Remote Sensing》 2012年第1期1-13,共13页
Development of irrigation infrastructure and its efficient management is the primary concern for sustainable food production. The assessment of irrigation infrastructure creation, its utilization, diagnostic evaluatio... Development of irrigation infrastructure and its efficient management is the primary concern for sustainable food production. The assessment of irrigation infrastructure creation, its utilization, diagnostic evaluation of the various performance indices (monitoring) are important to measure the efficiency. Benchmarking of Irrigation Systems (BIS) is for the diagnostic analysis of irrigation performance indicators comprising of Irrigation Infrastructure System (IIS), Agricultural System (AS), Water Delivery Dynamics (WDD). Since, the performance of an irrigation command varies with space and time, utilization of spatial information technologies viz. Remote Sensing (RS), Geographical Information Systems (GIS), Global Positioning Systems (GPS) useful to provide spatial information on several indices in the process of benchmarking (BM). Information requirements for BIS at different stages, utilization of spatial information technologies to derive irrigation performance indicators was discussed with suitable examples and demonstrated in this study. The studies carried out indicates that the geospatial approach for BIS enabled the improvements in data collection methods, diagnostic analysis, spatio-temporal visualisation of BM indicators at disaggregated canal level which would be useful for decision support during the corrective management measures. The conjunctive use of multi-date (medium resolution) satellite data, high spatial resolution data, field data on water deliveries was found to be an alternative to the conventional non-spatial approaches for BIS and thereby better water resources planning and management. 展开更多
关键词 IRRIGATION AGRICULTURE benchmarking of IRRIGATION systems GEOSPATIAL Techniques REMOTE Sensing GIS
暂未订购
DEVELOPING AN ENERGY BENCHMARKING SYSTEM FOR HOTEL BUILDINGS USING THE STATISTICAL METHOD AND THE SIMULATION-BASED APPROACH
6
作者 Anh Tuan Nguyen David Rockwood 《Journal of Green Building》 2019年第3期3-22,共20页
Due to increased tourist activity,many cities now have a large number of hotel buildings.It is necessary to establish measures to evaluate energy use intensity to effectively manage energy consumption in this sector.T... Due to increased tourist activity,many cities now have a large number of hotel buildings.It is necessary to establish measures to evaluate energy use intensity to effectively manage energy consumption in this sector.This study uses a combined strategy to establish an energy benchmark for hotel buildings in Vietnam.First,a survey and analysis of actual building stock data of 50 hotels in Danang,Vietnam,was conducted.The survey-based benchmark and its related data was then used to build a reference energy model to estimate an energy benchmark for other climatic regions in Vietnam by using the energy simulation method.The results reveal that the average energy use intensity for hotels in Danang was 87.4 kWh/m2.year or 8628.6 kWh/viproom.year.However,this study proposes that because of the differing expectations of comfort standards,hotels of different grades should have separate benchmarks.This study also proposes an energy intensity-based rating scale,including 7 grades from the least energy intensive(grade A)to the most energy intensive(grade G),which can be used to manage,label,or encourage sustainable energy use in hotel buildings.The relationship between the energy use intensity and the occupancy rate of the hotels was reported,compared,and explained.It was found that occupancy rate has no significant impact on the energy use intensity.From the survey result,some predictive models were developed to estimate annual energy consumption of hotel buildings based on their grades.The simulated benchmarks for other regions were also achieved.The results demonstrate many potential applications in the management,design and construction,and renovation of this building type. 展开更多
关键词 energy benchmarking HOTEL energy use intensity energy labeling energy rating scale simulation-based benchmarking
在线阅读 下载PDF
Medium-term Air Quality Benchmarking for Ecosystem Monitoring and Sustainability Planning: Case Study Dallas County (U.S.A.) 2015 to 2020
7
作者 David A.Wood 《Research in Ecology》 2021年第4期35-53,共19页
Medium-term air quality assessment,benchmarking it to recent past data can usefully complement short-term air quality index data for monitoring purposes.By using daily and monthly averaged data,medium-term air quality... Medium-term air quality assessment,benchmarking it to recent past data can usefully complement short-term air quality index data for monitoring purposes.By using daily and monthly averaged data,medium-term air quality benchmarking provides a distinctive perspective with which to monitor air quality for sustainability planning and ecosystem perspectives.By normalizing the data for individual air pollutants to a standard scale they can be more easily integrated to generate a daily combined local area benchmark(CLAB).The objectives of the study are to demonstrate that medium-term air quality benchmarking can be tailored to reflect local conditions by selecting the most relevant pollutants to incorporate in the CLAB indicator.Such a benchmark can provide an overall air quality assessment for areas of interest.A case study is presented for Dallas County(U.S.A.)applying the proposed method by benchmarking 2020 data for air pollutants to their trends established for 2015 to 2019.Six air pollutants considered are:ozone,carbon monoxide,nitrogen dioxide,sulfur dioxide,benzene and particulate matter less than 2.5 micrometres.These pollutants are assessed individually and in terms of CLAB,and their 2020 variations for Dallas County compared to daily trends established for years 2015 to 2019.Reductions in benzene and carbon monoxide during much of 2020 are clearly discernible compared to preceding years.The CLAB indicator shows clear seasonal trends for air quality for 2015 to 2019 with high pollution in winter and spring compared to other seasons that is strongly influenced by climatic variations with some anthropogenic inputs.Conducting CLAB analysis on an ongoing basis,using a relevant near-past time interval for benchmarking that covers several years,can reveal useful monthly,seasonal and annual trends in overall air quality.This type of medium-term,benchmarked air quality data analysis is well suited for ecosystem monitoring. 展开更多
关键词 Local air pollution assessment Medium-term air quality Local area benchmarking Critical pollutants Seasonal variations in air quality Sustainability planning
在线阅读 下载PDF
OpenPoly:A Polymer Database Empowering Benchmarking and MultipropertyPredictions
8
作者 Ji-Feng Wang Yu-Bo Sun +4 位作者 Qiu-Tong Chen Fei-Fan Ji Yuan-Yuan Song Meng-Yuan Ruan Ying Wang 《Chinese Journal of Polymer Science》 2025年第10期1749-1760,共12页
Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-pr... Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-property labels,and limited accessibility,hindering system-atic modeling across property prediction tasks.Here,we present OpenPoly,a curated experimental polymer database derived from extensive lit-erature mining and manual validation,comprising 3985 unique polymer-property data points spanning 26 key properties.We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models.Our re-sults highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy.In data-scarce condition,XGBoost outperforms deep learning models on key properties such as dielectric con-stant,glass transition temperature,melting point,and mechanical strength,achieving R2 scores of 0.65-0.87.To further showcase the practical utility of the database,we propose potential polymers for two energy-relevant applications:high temperature polymer dielectrics and fuel cell membranes.By offering a consistent and accessible benchmark and database,OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering. 展开更多
关键词 Polymer database Polymer structure encoding Property prediction Functional reverse design Benchmark models
原文传递
Optimization of the prompt fission neutron spectra of ^(239)Pu(n,f)via criticality benchmarking
9
作者 Jia-Hao Chen Bo Yang +5 位作者 Qing-Gang Jia Rui Li Wen-Di Chen Hai-Rui Guo Wei-Li Sun Tao Ye 《Nuclear Science and Techniques》 2025年第9期139-149,共11页
Prompt fission neutron spectra(PFNS)have a significant role in nuclear science and technology.In this study,the PFNS for^(239)Pu are evaluated using both differential and integral experimental data.A method that lever... Prompt fission neutron spectra(PFNS)have a significant role in nuclear science and technology.In this study,the PFNS for^(239)Pu are evaluated using both differential and integral experimental data.A method that leverages integral criticality benchmark experiments to constrain the PFNS data is introduced.The measured central values of the PFNS are perturbed by constructing a covariance matrix.The PFNS are sampled using two types of covariance matrices,either generated with an assumed correlation matrix and incorporating experimental uncertainties or derived directly from experimental reports.The joint Monte Carlo transport code is employed to perform transport simulations on five criticality benchmark assemblies by utilizing perturbed PFNS data.Extensive simulations result in an optimized PFNS that shows improved agreement with the integral criticality benchmark experiments.This study introduces a novel approach for optimizing differential experimental data through integral experiments,particularly when a covariance matrix is not provided. 展开更多
关键词 Prompt fission neutron spectra Differential nuclear data Criticality benchmark Random sample Transport simulation
在线阅读 下载PDF
Benchmark experiment system for^(252)Cf spontaneous fission source usingγtagging
10
作者 Yu-Ting Wei Chang-Lin Lan +11 位作者 Bo Gao Xian-Lin Yang Gong Jiang Jia-Hao Wang Bo Xie Yang-Bo Nie Yan-Liang Chang Ge Zhang Fan Wu Kuo-Zhi Xu Shi-Long Liu Xi-Chao Ruan 《Nuclear Science and Techniques》 2025年第11期221-231,共11页
Benchmark experiments are indispensable for the development of neutron nuclear data evaluation libraries.Given the lack of domestic benchmarking of nuclear data in the fission energy region,this study developed a neut... Benchmark experiments are indispensable for the development of neutron nuclear data evaluation libraries.Given the lack of domestic benchmarking of nuclear data in the fission energy region,this study developed a neutron leakage spectrum measurement system using a spherical sample based on the^(252)Cf spontaneous fission source.The EJ309 detector(for highenergy measurements)and CLYC detector(for low-energy measurements)were combined to measure the time-of-flight spectrum using theγtagging method.To assess the performance of the system,the time-of-flight spectrum without a sample was measured first.The experimental spectra were consistent with those simulated using the Monte Carlo method and the standard^(252)Cf spectrum from ISO:8529-1.This demonstrates that the system can effectively measure the neutron events in the 0.15-8.0 MeV range.Then,a spherical polyethylene sample was used as the standard to verify the accuracy of the system for the benchmark experiment.The simulation results were obtained using the Monte Carlo method with evaluated data from the ENDF/B-Ⅷ.0,CENDL-3.2,JEFF-3.3,and JENDL-5 libraries.The measured neutron leakage spectra were compared with the corresponding simulated results for the neutron spectrum shape and calculated C/E values.The results showed that the simulated spectra with different data libraries reproduced the experimental results well in the 0.15-8.0 MeV range.This study confirms that the leakage neutron spectrum measurement system based on the^(252)Cf source can perform benchmarking and provides a foundation for evaluating neutron nuclear data through benchmark experiments. 展开更多
关键词 ^(252)Cf Neutron leakage spectrum Benchmark experiment Time-of-flight technique Evaluated nuclear data Spherical samples
在线阅读 下载PDF
Benchmarking技术在汽车开发领域中的应用 被引量:16
11
作者 祁鹏华 褚超美 张轶 《机械设计与制造》 北大核心 2008年第10期64-66,共3页
介绍了Benchmarking技术的概念;综合国内外汽车发展的历史和现状,针对我国汽车开发经验、知识积累不足问题,引入了Benchmarking技术的样车分析技术体系,细化了样车Benchmarking的流程,说明了Benchmarking技术的应用过程。最后介绍了目前... 介绍了Benchmarking技术的概念;综合国内外汽车发展的历史和现状,针对我国汽车开发经验、知识积累不足问题,引入了Benchmarking技术的样车分析技术体系,细化了样车Benchmarking的流程,说明了Benchmarking技术的应用过程。最后介绍了目前Benchmarking技术在汽车开发知识库系统中的应用现状。 展开更多
关键词 benchmarking技术 汽车开发 知识库
在线阅读 下载PDF
Benchmarking:竞争情报的一种重要手段 被引量:27
12
作者 张左之 《情报理论与实践》 CSSCI 北大核心 1995年第1期23-26,共4页
Benchmarking的有关内容是竞争情报的重要组成部分。Benchmarking的理论和方法是随着竞争情报的发展而发展起来的。本文从Benchmarking的由来、基本概念、种类等步骤方面较为全面地论述了Benc... Benchmarking的有关内容是竞争情报的重要组成部分。Benchmarking的理论和方法是随着竞争情报的发展而发展起来的。本文从Benchmarking的由来、基本概念、种类等步骤方面较为全面地论述了Benchmarking的有关情况,希望能引起对Benchmarking这一竞争情报重要手段研究的重视。 展开更多
关键词 竞争情报 企业信息 benchmarking
在线阅读 下载PDF
用Benchmarking方法分析我校信息管理与信息系统专业建设及发展 被引量:3
13
作者 赵海燕 刘合翔 《科技信息》 2008年第16期14-15,共2页
本文针对我校信息管理专业的实际情况,以上海交大作为比较对象,用Benchmarking方法从人才培养、管理科学、社会服务等方面作为比较内容,提出赶超对策和实施计划,最后总结了文中的不足。
关键词 benchmarking 专业 发展
在线阅读 下载PDF
NoSQL数据库实践教学:结合Benchmarking的方法 被引量:1
14
作者 叶枫 孙骏 +2 位作者 黄倩 李幽铮 李凌 《软件导刊》 2022年第11期162-165,共4页
随着大数据、人工智能时代的到来,NoSQL数据库对工业界和学术界的影响也愈加深入,如何培养学生的NoSQL实践应用能力成为关键问题。但是,受限于学分设定、实验环境等,当前高校对于相关课程的开设和教学力度不够,学生缺少相关的NoSQL数据... 随着大数据、人工智能时代的到来,NoSQL数据库对工业界和学术界的影响也愈加深入,如何培养学生的NoSQL实践应用能力成为关键问题。但是,受限于学分设定、实验环境等,当前高校对于相关课程的开设和教学力度不够,学生缺少相关的NoSQL数据库实践能力,与业界所需大数据能力的鸿沟越拉越大。针对该问题,介绍所在团队在过去5年中,基于双层学习模式开展NoSQL数据库实践教学的经验。结合Benchmarking方法,根据学生的课程评价和反馈,验证了该教学模式取得了较好的实践教学效果。 展开更多
关键词 NOSQL 实践教学 能力提升 benchmarking方法
在线阅读 下载PDF
超越自我:Benchmarking的再研究   被引量:1
15
作者 唐少清 《集团经济研究》 北大核心 2007年第08X期117-118,共2页
Benchmarking在英文中的原义是“板凳标尺”,中文有的译作“定标赶超”,“标尺”等,还有译为形象化的“见贤思齐”。实际上它是一种企业管理的工具,即“学习、复制、创新”的过程。对于Benchmarking,有多种说法。
关键词 benchmarking 超越自我 企业管理 形象化
在线阅读 下载PDF
自适应交叉与组合变异的多任务GP进行本体匹配
16
作者 戴可涛 吕青 姜照航 《现代电子技术》 北大核心 2026年第4期155-164,共10页
本体匹配是解决本体异质性问题的有效手段,为提高本体匹配质量并抑制遗传规划中膨胀现象,提出一种自适应交叉与组合变异的多任务遗传规划算法,实现两个任务种群间的知识交互。引入规模小的树抑制膨胀,并使用额外任务种群来引导目标任务... 本体匹配是解决本体异质性问题的有效手段,为提高本体匹配质量并抑制遗传规划中膨胀现象,提出一种自适应交叉与组合变异的多任务遗传规划算法,实现两个任务种群间的知识交互。引入规模小的树抑制膨胀,并使用额外任务种群来引导目标任务种群跳出局部最优。该算法采用一种新型任务间自适应交叉算子,根据个体及其亲本的表现选择不同交叉策略,使算法全面探索搜索空间。此外,提出一种基于组合概率的变异算子以引导目标任务种群实现更优质的变异,并设计一种新的适应度函数以抑制树规模,优化匹配性能同时减少树规模。在OAEI基准测试集(Benchmark)上进行实验,结果表明,所提方法在所有测试集上都取得优异的匹配性能,相较于其他前沿方法表现更优。 展开更多
关键词 本体匹配 遗传规划算法 自适应交叉算子 组合变异 BENCHMARK 相似度特征
在线阅读 下载PDF
用广义Benchmarking帮助企业解困腾飞
17
作者 赵国杰 杨光 《技术经济》 2000年第4期39-41,共3页
关键词 benchmarking 营销战略 竞争者 企业
在线阅读 下载PDF
Integration of Large Language Models(LLMs)and Static Analysis for Improving the Efficacy of Security Vulnerability Detection in Source Code
18
作者 JoséArmando Santas Ciavatta Juan Ramón Bermejo Higuera +3 位作者 Javier Bermejo Higuera Juan Antonio Sicilia Montalvo Tomás Sureda Riera Jesús Pérez Melero 《Computers, Materials & Continua》 2026年第3期351-390,共40页
As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processin... As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processing and enabled significant improvements in various applications.This document seeks to investigate the security vulnerabilities detection in the source code using a range of large language models(LLM).Our primary objective is to evaluate the effectiveness of Static Application Security Testing(SAST)by applying various techniques such as prompt persona,structure outputs and zero-shot.To the selection of the LLMs(CodeLlama 7B,DeepSeek coder 7B,Gemini 1.5 Flash,Gemini 2.0 Flash,Mistral 7b Instruct,Phi 38b Mini 128K instruct,Qwen 2.5 coder,StartCoder 27B)with comparison and combination with Find Security Bugs.The evaluation method will involve using a selected dataset containing vulnerabilities,and the results to provide insights for different scenarios according to the software criticality(Business critical,non-critical,minimum effort,best effort)In detail,the main objectives of this study are to investigate if large language models outperform or exceed the capabilities of traditional static analysis tools,if the combining LLMs with Static Application Security Testing(SAST)tools lead to an improvement and the possibility that local machine learning models on a normal computer produce reliable results.Summarizing the most important conclusions of the research,it can be said that while it is true that the results have improved depending on the size of the LLM for business-critical software,the best results have been obtained by SAST analysis.This differs in“NonCritical,”“Best Effort,”and“Minimum Effort”scenarios,where the combination of LLM(Gemini)+SAST has obtained better results. 展开更多
关键词 AI+SAST secure code LLM benchmarking LLM vulnerability detection
在线阅读 下载PDF
A Survey on Medical Competence Evaluation Benchmarks for Large Language Models
19
作者 Qiting Wang Huiru Zou +3 位作者 Haobin Zhang Yongshun Huang Junzhang Tian Weibin Cheng 《Health Care Science》 2026年第1期4-18,共15页
Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medic... Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medical practice,a rigorous and systematic evaluation of their medical competence is imperative.This study presents a comprehensive review of the established methodologies and benchmarks for evaluating the medical competence of LLMs,encompassing a thorough analysis of current assessment practices across medical knowledge,clinical practice competence,and ethical-safety considerations.By integrating clinician competency assessment frameworks into LLMs evaluation,we propose a structured tri-dimensional framework that systematically organizes existing evaluation approaches according to medical theoretical knowledge,clinical practice ability,and ethical-safety considerations.Furthermore,this research provides critical insights into future developmental trajectories while establishing foundational frameworks and standardization protocols for the integration of LLMs into medical practice. 展开更多
关键词 BENCHMARK large language model medical competence ABSTRACT
在线阅读 下载PDF
基于DEA-Benchmarking模型的双层成本控制绩效评价
20
作者 周瑜 《财会月刊(中)》 2013年第9期36-38,共3页
本文运用DEA-Benchmarking模型对制造企业双层成本控制绩效进行评价。然后选取DA企业机加车间7个班组成本相关数据构造7组标杆单元,进行DEA评价,判断成本控制效率,验证了双层成本控制的科学性与先进性。
关键词 成本控制 DEA—benchmarking 绩效评价
在线阅读 下载PDF
上一页 1 2 30 下一页 到第
使用帮助 返回顶部