Most material distribution-based topology optimization methods work on a relaxed form of the optimization problem and then push the solution toward the binary limits.However,when benchmarking these methods,researchers...Most material distribution-based topology optimization methods work on a relaxed form of the optimization problem and then push the solution toward the binary limits.However,when benchmarking these methods,researchers use known solutions to only a single form of benchmark problem.This paper proposes a comparison platform for systematic benchmarking of topology optimization methods using both binary and relaxed forms.A greyness measure is implemented to evaluate how far a solution is from the desired binary form.The well-known ZhouRozvany(ZR)problem is selected as the benchmarking problem here,making use of available global solutions for both its relaxed and binary forms.The recently developed non-penalization Smooth-edged Material Distribution for Optimizing Topology(SEMDOT),well-established Solid Isotropic Material with Penalization(SIMP),and continuation methods are studied on this platform.Interestingly,in most cases,the grayscale solutions obtained by SEMDOT demonstrate better performance in dealing with the ZR problem than SIMP.The reasons are investigated and attributed to the usage of two different regularization techniques,namely,the Heaviside smooth function in SEMDOT and the power-law penalty in SIMP.More importantly,a simple-to-use benchmarking graph is proposed for evaluating newly developed topology optimization methods.展开更多
The exponential growth of video content has driven significant advancements in video summarization techniques in recent years.Breakthroughs in deep learning have been particularly transformative,enabling more effectiv...The exponential growth of video content has driven significant advancements in video summarization techniques in recent years.Breakthroughs in deep learning have been particularly transformative,enabling more effective detection of key information and creating new possibilities for video synopsis.To summarize recent progress and accelerate research in this field,this paper provides a comprehensive review of deep learning-based video summarization methods developed over the past decade.We begin by examining the research landscape of video abstraction technologies and identifying core challenges in video summarization.Subsequently,we systematically analyze prevailing deep learning frameworks and methodologies employed in current video summarization systems,offering researchers a clear roadmap of the field's evelution.Unlike previous review works,we first classify research papers based on the structural hierarchy of the video(from frame-level to shot-level to video-level),then further categorize them according to the summary backbone model(feature extraction and spatiotemporal modeling).This approach provides a more systematic and hierarchical organization of the documents.Following this comprehensive review,we summarize the benchmark datasets and evaluation metrics commonly employed in the field.Finally,we analyze persistent challenges and propose insightful directions for future research,providing a forward-looking perspective on video summarization technologies.This systematic literature review is of great reference value to new researchers exploring the fields of deep learning and video summarization.展开更多
The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decis...The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature.展开更多
Ensuring the general efficacy and benefit for human beings from medical Large Language Models(LLM)before real-world deployment is crucial.However,a widely accepted and accessible evaluation process for medical LLM,esp...Ensuring the general efficacy and benefit for human beings from medical Large Language Models(LLM)before real-world deployment is crucial.However,a widely accepted and accessible evaluation process for medical LLM,especially in the Chinese context,remains to be established.In this work,we introduce“MedBench”,a comprehensive,standardized,and reliable benchmarking system for Chinese medical LLM.First,MedBench assembles the currently largest evaluation dataset(300901 questions)to cover 43 clinical specialties,and performs multi-faceted evaluation on medical LLM.Second,MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure,with physical separations between question and ground truth.Third,MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer memorization.Applying MedBench to popular general and medical LLMs,we observe unbiased,reproducible evaluation results largely aligning with medical professionals’perspectives.This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs.展开更多
Development of irrigation infrastructure and its efficient management is the primary concern for sustainable food production. The assessment of irrigation infrastructure creation, its utilization, diagnostic evaluatio...Development of irrigation infrastructure and its efficient management is the primary concern for sustainable food production. The assessment of irrigation infrastructure creation, its utilization, diagnostic evaluation of the various performance indices (monitoring) are important to measure the efficiency. Benchmarking of Irrigation Systems (BIS) is for the diagnostic analysis of irrigation performance indicators comprising of Irrigation Infrastructure System (IIS), Agricultural System (AS), Water Delivery Dynamics (WDD). Since, the performance of an irrigation command varies with space and time, utilization of spatial information technologies viz. Remote Sensing (RS), Geographical Information Systems (GIS), Global Positioning Systems (GPS) useful to provide spatial information on several indices in the process of benchmarking (BM). Information requirements for BIS at different stages, utilization of spatial information technologies to derive irrigation performance indicators was discussed with suitable examples and demonstrated in this study. The studies carried out indicates that the geospatial approach for BIS enabled the improvements in data collection methods, diagnostic analysis, spatio-temporal visualisation of BM indicators at disaggregated canal level which would be useful for decision support during the corrective management measures. The conjunctive use of multi-date (medium resolution) satellite data, high spatial resolution data, field data on water deliveries was found to be an alternative to the conventional non-spatial approaches for BIS and thereby better water resources planning and management.展开更多
Due to increased tourist activity,many cities now have a large number of hotel buildings.It is necessary to establish measures to evaluate energy use intensity to effectively manage energy consumption in this sector.T...Due to increased tourist activity,many cities now have a large number of hotel buildings.It is necessary to establish measures to evaluate energy use intensity to effectively manage energy consumption in this sector.This study uses a combined strategy to establish an energy benchmark for hotel buildings in Vietnam.First,a survey and analysis of actual building stock data of 50 hotels in Danang,Vietnam,was conducted.The survey-based benchmark and its related data was then used to build a reference energy model to estimate an energy benchmark for other climatic regions in Vietnam by using the energy simulation method.The results reveal that the average energy use intensity for hotels in Danang was 87.4 kWh/m2.year or 8628.6 kWh/viproom.year.However,this study proposes that because of the differing expectations of comfort standards,hotels of different grades should have separate benchmarks.This study also proposes an energy intensity-based rating scale,including 7 grades from the least energy intensive(grade A)to the most energy intensive(grade G),which can be used to manage,label,or encourage sustainable energy use in hotel buildings.The relationship between the energy use intensity and the occupancy rate of the hotels was reported,compared,and explained.It was found that occupancy rate has no significant impact on the energy use intensity.From the survey result,some predictive models were developed to estimate annual energy consumption of hotel buildings based on their grades.The simulated benchmarks for other regions were also achieved.The results demonstrate many potential applications in the management,design and construction,and renovation of this building type.展开更多
Medium-term air quality assessment,benchmarking it to recent past data can usefully complement short-term air quality index data for monitoring purposes.By using daily and monthly averaged data,medium-term air quality...Medium-term air quality assessment,benchmarking it to recent past data can usefully complement short-term air quality index data for monitoring purposes.By using daily and monthly averaged data,medium-term air quality benchmarking provides a distinctive perspective with which to monitor air quality for sustainability planning and ecosystem perspectives.By normalizing the data for individual air pollutants to a standard scale they can be more easily integrated to generate a daily combined local area benchmark(CLAB).The objectives of the study are to demonstrate that medium-term air quality benchmarking can be tailored to reflect local conditions by selecting the most relevant pollutants to incorporate in the CLAB indicator.Such a benchmark can provide an overall air quality assessment for areas of interest.A case study is presented for Dallas County(U.S.A.)applying the proposed method by benchmarking 2020 data for air pollutants to their trends established for 2015 to 2019.Six air pollutants considered are:ozone,carbon monoxide,nitrogen dioxide,sulfur dioxide,benzene and particulate matter less than 2.5 micrometres.These pollutants are assessed individually and in terms of CLAB,and their 2020 variations for Dallas County compared to daily trends established for years 2015 to 2019.Reductions in benzene and carbon monoxide during much of 2020 are clearly discernible compared to preceding years.The CLAB indicator shows clear seasonal trends for air quality for 2015 to 2019 with high pollution in winter and spring compared to other seasons that is strongly influenced by climatic variations with some anthropogenic inputs.Conducting CLAB analysis on an ongoing basis,using a relevant near-past time interval for benchmarking that covers several years,can reveal useful monthly,seasonal and annual trends in overall air quality.This type of medium-term,benchmarked air quality data analysis is well suited for ecosystem monitoring.展开更多
Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-pr...Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-property labels,and limited accessibility,hindering system-atic modeling across property prediction tasks.Here,we present OpenPoly,a curated experimental polymer database derived from extensive lit-erature mining and manual validation,comprising 3985 unique polymer-property data points spanning 26 key properties.We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models.Our re-sults highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy.In data-scarce condition,XGBoost outperforms deep learning models on key properties such as dielectric con-stant,glass transition temperature,melting point,and mechanical strength,achieving R2 scores of 0.65-0.87.To further showcase the practical utility of the database,we propose potential polymers for two energy-relevant applications:high temperature polymer dielectrics and fuel cell membranes.By offering a consistent and accessible benchmark and database,OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.展开更多
Prompt fission neutron spectra(PFNS)have a significant role in nuclear science and technology.In this study,the PFNS for^(239)Pu are evaluated using both differential and integral experimental data.A method that lever...Prompt fission neutron spectra(PFNS)have a significant role in nuclear science and technology.In this study,the PFNS for^(239)Pu are evaluated using both differential and integral experimental data.A method that leverages integral criticality benchmark experiments to constrain the PFNS data is introduced.The measured central values of the PFNS are perturbed by constructing a covariance matrix.The PFNS are sampled using two types of covariance matrices,either generated with an assumed correlation matrix and incorporating experimental uncertainties or derived directly from experimental reports.The joint Monte Carlo transport code is employed to perform transport simulations on five criticality benchmark assemblies by utilizing perturbed PFNS data.Extensive simulations result in an optimized PFNS that shows improved agreement with the integral criticality benchmark experiments.This study introduces a novel approach for optimizing differential experimental data through integral experiments,particularly when a covariance matrix is not provided.展开更多
Benchmark experiments are indispensable for the development of neutron nuclear data evaluation libraries.Given the lack of domestic benchmarking of nuclear data in the fission energy region,this study developed a neut...Benchmark experiments are indispensable for the development of neutron nuclear data evaluation libraries.Given the lack of domestic benchmarking of nuclear data in the fission energy region,this study developed a neutron leakage spectrum measurement system using a spherical sample based on the^(252)Cf spontaneous fission source.The EJ309 detector(for highenergy measurements)and CLYC detector(for low-energy measurements)were combined to measure the time-of-flight spectrum using theγtagging method.To assess the performance of the system,the time-of-flight spectrum without a sample was measured first.The experimental spectra were consistent with those simulated using the Monte Carlo method and the standard^(252)Cf spectrum from ISO:8529-1.This demonstrates that the system can effectively measure the neutron events in the 0.15-8.0 MeV range.Then,a spherical polyethylene sample was used as the standard to verify the accuracy of the system for the benchmark experiment.The simulation results were obtained using the Monte Carlo method with evaluated data from the ENDF/B-Ⅷ.0,CENDL-3.2,JEFF-3.3,and JENDL-5 libraries.The measured neutron leakage spectra were compared with the corresponding simulated results for the neutron spectrum shape and calculated C/E values.The results showed that the simulated spectra with different data libraries reproduced the experimental results well in the 0.15-8.0 MeV range.This study confirms that the leakage neutron spectrum measurement system based on the^(252)Cf source can perform benchmarking and provides a foundation for evaluating neutron nuclear data through benchmark experiments.展开更多
As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processin...As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processing and enabled significant improvements in various applications.This document seeks to investigate the security vulnerabilities detection in the source code using a range of large language models(LLM).Our primary objective is to evaluate the effectiveness of Static Application Security Testing(SAST)by applying various techniques such as prompt persona,structure outputs and zero-shot.To the selection of the LLMs(CodeLlama 7B,DeepSeek coder 7B,Gemini 1.5 Flash,Gemini 2.0 Flash,Mistral 7b Instruct,Phi 38b Mini 128K instruct,Qwen 2.5 coder,StartCoder 27B)with comparison and combination with Find Security Bugs.The evaluation method will involve using a selected dataset containing vulnerabilities,and the results to provide insights for different scenarios according to the software criticality(Business critical,non-critical,minimum effort,best effort)In detail,the main objectives of this study are to investigate if large language models outperform or exceed the capabilities of traditional static analysis tools,if the combining LLMs with Static Application Security Testing(SAST)tools lead to an improvement and the possibility that local machine learning models on a normal computer produce reliable results.Summarizing the most important conclusions of the research,it can be said that while it is true that the results have improved depending on the size of the LLM for business-critical software,the best results have been obtained by SAST analysis.This differs in“NonCritical,”“Best Effort,”and“Minimum Effort”scenarios,where the combination of LLM(Gemini)+SAST has obtained better results.展开更多
Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medic...Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medical practice,a rigorous and systematic evaluation of their medical competence is imperative.This study presents a comprehensive review of the established methodologies and benchmarks for evaluating the medical competence of LLMs,encompassing a thorough analysis of current assessment practices across medical knowledge,clinical practice competence,and ethical-safety considerations.By integrating clinician competency assessment frameworks into LLMs evaluation,we propose a structured tri-dimensional framework that systematically organizes existing evaluation approaches according to medical theoretical knowledge,clinical practice ability,and ethical-safety considerations.Furthermore,this research provides critical insights into future developmental trajectories while establishing foundational frameworks and standardization protocols for the integration of LLMs into medical practice.展开更多
文摘Most material distribution-based topology optimization methods work on a relaxed form of the optimization problem and then push the solution toward the binary limits.However,when benchmarking these methods,researchers use known solutions to only a single form of benchmark problem.This paper proposes a comparison platform for systematic benchmarking of topology optimization methods using both binary and relaxed forms.A greyness measure is implemented to evaluate how far a solution is from the desired binary form.The well-known ZhouRozvany(ZR)problem is selected as the benchmarking problem here,making use of available global solutions for both its relaxed and binary forms.The recently developed non-penalization Smooth-edged Material Distribution for Optimizing Topology(SEMDOT),well-established Solid Isotropic Material with Penalization(SIMP),and continuation methods are studied on this platform.Interestingly,in most cases,the grayscale solutions obtained by SEMDOT demonstrate better performance in dealing with the ZR problem than SIMP.The reasons are investigated and attributed to the usage of two different regularization techniques,namely,the Heaviside smooth function in SEMDOT and the power-law penalty in SIMP.More importantly,a simple-to-use benchmarking graph is proposed for evaluating newly developed topology optimization methods.
基金supported by UKRI(EP/Z000025/1)Horizon Europe Programme under the MSCA grant for the ACMod project(101130271)。
文摘The exponential growth of video content has driven significant advancements in video summarization techniques in recent years.Breakthroughs in deep learning have been particularly transformative,enabling more effective detection of key information and creating new possibilities for video synopsis.To summarize recent progress and accelerate research in this field,this paper provides a comprehensive review of deep learning-based video summarization methods developed over the past decade.We begin by examining the research landscape of video abstraction technologies and identifying core challenges in video summarization.Subsequently,we systematically analyze prevailing deep learning frameworks and methodologies employed in current video summarization systems,offering researchers a clear roadmap of the field's evelution.Unlike previous review works,we first classify research papers based on the structural hierarchy of the video(from frame-level to shot-level to video-level),then further categorize them according to the summary backbone model(feature extraction and spatiotemporal modeling).This approach provides a more systematic and hierarchical organization of the documents.Following this comprehensive review,we summarize the benchmark datasets and evaluation metrics commonly employed in the field.Finally,we analyze persistent challenges and propose insightful directions for future research,providing a forward-looking perspective on video summarization technologies.This systematic literature review is of great reference value to new researchers exploring the fields of deep learning and video summarization.
文摘The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature.
基金supported by the National Key R&D Program of China(Nos.2022ZD0160705 and 2022ZD0160704)the Three-year Action Program of Shanghai Municipality for Strengthening the Construction of Public Health System(No.GWVI-11.1-49)the Health Industry National Intelligent Social Governance Experiment Base(Shanghai)——Medical Artificial Intelligence Scenario Application Case Study and Social Experiment Survey,and Shanghai Artificial Intelligence Laboratory.
文摘Ensuring the general efficacy and benefit for human beings from medical Large Language Models(LLM)before real-world deployment is crucial.However,a widely accepted and accessible evaluation process for medical LLM,especially in the Chinese context,remains to be established.In this work,we introduce“MedBench”,a comprehensive,standardized,and reliable benchmarking system for Chinese medical LLM.First,MedBench assembles the currently largest evaluation dataset(300901 questions)to cover 43 clinical specialties,and performs multi-faceted evaluation on medical LLM.Second,MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure,with physical separations between question and ground truth.Third,MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer memorization.Applying MedBench to popular general and medical LLMs,we observe unbiased,reproducible evaluation results largely aligning with medical professionals’perspectives.This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs.
文摘Development of irrigation infrastructure and its efficient management is the primary concern for sustainable food production. The assessment of irrigation infrastructure creation, its utilization, diagnostic evaluation of the various performance indices (monitoring) are important to measure the efficiency. Benchmarking of Irrigation Systems (BIS) is for the diagnostic analysis of irrigation performance indicators comprising of Irrigation Infrastructure System (IIS), Agricultural System (AS), Water Delivery Dynamics (WDD). Since, the performance of an irrigation command varies with space and time, utilization of spatial information technologies viz. Remote Sensing (RS), Geographical Information Systems (GIS), Global Positioning Systems (GPS) useful to provide spatial information on several indices in the process of benchmarking (BM). Information requirements for BIS at different stages, utilization of spatial information technologies to derive irrigation performance indicators was discussed with suitable examples and demonstrated in this study. The studies carried out indicates that the geospatial approach for BIS enabled the improvements in data collection methods, diagnostic analysis, spatio-temporal visualisation of BM indicators at disaggregated canal level which would be useful for decision support during the corrective management measures. The conjunctive use of multi-date (medium resolution) satellite data, high spatial resolution data, field data on water deliveries was found to be an alternative to the conventional non-spatial approaches for BIS and thereby better water resources planning and management.
基金supported by The University of Danang,University of Science and Technology,code number of the project:T2017-02-75.David Rockwood’s participation was made possible by a Core Fulbright U.S.Scholar award to Vietnam,sponsored by the U.S.Department of State’s Bureau of Educational and Cultural Affairs.
文摘Due to increased tourist activity,many cities now have a large number of hotel buildings.It is necessary to establish measures to evaluate energy use intensity to effectively manage energy consumption in this sector.This study uses a combined strategy to establish an energy benchmark for hotel buildings in Vietnam.First,a survey and analysis of actual building stock data of 50 hotels in Danang,Vietnam,was conducted.The survey-based benchmark and its related data was then used to build a reference energy model to estimate an energy benchmark for other climatic regions in Vietnam by using the energy simulation method.The results reveal that the average energy use intensity for hotels in Danang was 87.4 kWh/m2.year or 8628.6 kWh/viproom.year.However,this study proposes that because of the differing expectations of comfort standards,hotels of different grades should have separate benchmarks.This study also proposes an energy intensity-based rating scale,including 7 grades from the least energy intensive(grade A)to the most energy intensive(grade G),which can be used to manage,label,or encourage sustainable energy use in hotel buildings.The relationship between the energy use intensity and the occupancy rate of the hotels was reported,compared,and explained.It was found that occupancy rate has no significant impact on the energy use intensity.From the survey result,some predictive models were developed to estimate annual energy consumption of hotel buildings based on their grades.The simulated benchmarks for other regions were also achieved.The results demonstrate many potential applications in the management,design and construction,and renovation of this building type.
文摘Medium-term air quality assessment,benchmarking it to recent past data can usefully complement short-term air quality index data for monitoring purposes.By using daily and monthly averaged data,medium-term air quality benchmarking provides a distinctive perspective with which to monitor air quality for sustainability planning and ecosystem perspectives.By normalizing the data for individual air pollutants to a standard scale they can be more easily integrated to generate a daily combined local area benchmark(CLAB).The objectives of the study are to demonstrate that medium-term air quality benchmarking can be tailored to reflect local conditions by selecting the most relevant pollutants to incorporate in the CLAB indicator.Such a benchmark can provide an overall air quality assessment for areas of interest.A case study is presented for Dallas County(U.S.A.)applying the proposed method by benchmarking 2020 data for air pollutants to their trends established for 2015 to 2019.Six air pollutants considered are:ozone,carbon monoxide,nitrogen dioxide,sulfur dioxide,benzene and particulate matter less than 2.5 micrometres.These pollutants are assessed individually and in terms of CLAB,and their 2020 variations for Dallas County compared to daily trends established for years 2015 to 2019.Reductions in benzene and carbon monoxide during much of 2020 are clearly discernible compared to preceding years.The CLAB indicator shows clear seasonal trends for air quality for 2015 to 2019 with high pollution in winter and spring compared to other seasons that is strongly influenced by climatic variations with some anthropogenic inputs.Conducting CLAB analysis on an ongoing basis,using a relevant near-past time interval for benchmarking that covers several years,can reveal useful monthly,seasonal and annual trends in overall air quality.This type of medium-term,benchmarked air quality data analysis is well suited for ecosystem monitoring.
基金financially supported by the National Natural Science Foundation of China (Nos. 92372126,52373203)the Excellent Young Scientists Fund Program
文摘Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-property labels,and limited accessibility,hindering system-atic modeling across property prediction tasks.Here,we present OpenPoly,a curated experimental polymer database derived from extensive lit-erature mining and manual validation,comprising 3985 unique polymer-property data points spanning 26 key properties.We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models.Our re-sults highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy.In data-scarce condition,XGBoost outperforms deep learning models on key properties such as dielectric con-stant,glass transition temperature,melting point,and mechanical strength,achieving R2 scores of 0.65-0.87.To further showcase the practical utility of the database,we propose potential polymers for two energy-relevant applications:high temperature polymer dielectrics and fuel cell membranes.By offering a consistent and accessible benchmark and database,OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.
基金supported by the National Natural Science Foundation of China(No.12347126)。
文摘Prompt fission neutron spectra(PFNS)have a significant role in nuclear science and technology.In this study,the PFNS for^(239)Pu are evaluated using both differential and integral experimental data.A method that leverages integral criticality benchmark experiments to constrain the PFNS data is introduced.The measured central values of the PFNS are perturbed by constructing a covariance matrix.The PFNS are sampled using two types of covariance matrices,either generated with an assumed correlation matrix and incorporating experimental uncertainties or derived directly from experimental reports.The joint Monte Carlo transport code is employed to perform transport simulations on five criticality benchmark assemblies by utilizing perturbed PFNS data.Extensive simulations result in an optimized PFNS that shows improved agreement with the integral criticality benchmark experiments.This study introduces a novel approach for optimizing differential experimental data through integral experiments,particularly when a covariance matrix is not provided.
基金supported by the National Natural Science Foundation of China(No.U2067205)。
文摘Benchmark experiments are indispensable for the development of neutron nuclear data evaluation libraries.Given the lack of domestic benchmarking of nuclear data in the fission energy region,this study developed a neutron leakage spectrum measurement system using a spherical sample based on the^(252)Cf spontaneous fission source.The EJ309 detector(for highenergy measurements)and CLYC detector(for low-energy measurements)were combined to measure the time-of-flight spectrum using theγtagging method.To assess the performance of the system,the time-of-flight spectrum without a sample was measured first.The experimental spectra were consistent with those simulated using the Monte Carlo method and the standard^(252)Cf spectrum from ISO:8529-1.This demonstrates that the system can effectively measure the neutron events in the 0.15-8.0 MeV range.Then,a spherical polyethylene sample was used as the standard to verify the accuracy of the system for the benchmark experiment.The simulation results were obtained using the Monte Carlo method with evaluated data from the ENDF/B-Ⅷ.0,CENDL-3.2,JEFF-3.3,and JENDL-5 libraries.The measured neutron leakage spectra were compared with the corresponding simulated results for the neutron spectrum shape and calculated C/E values.The results showed that the simulated spectra with different data libraries reproduced the experimental results well in the 0.15-8.0 MeV range.This study confirms that the leakage neutron spectrum measurement system based on the^(252)Cf source can perform benchmarking and provides a foundation for evaluating neutron nuclear data through benchmark experiments.
文摘As artificial Intelligence(AI)continues to expand exponentially,particularly with the emergence of generative pre-trained transformers(GPT)based on a transformer’s architecture,which has revolutionized data processing and enabled significant improvements in various applications.This document seeks to investigate the security vulnerabilities detection in the source code using a range of large language models(LLM).Our primary objective is to evaluate the effectiveness of Static Application Security Testing(SAST)by applying various techniques such as prompt persona,structure outputs and zero-shot.To the selection of the LLMs(CodeLlama 7B,DeepSeek coder 7B,Gemini 1.5 Flash,Gemini 2.0 Flash,Mistral 7b Instruct,Phi 38b Mini 128K instruct,Qwen 2.5 coder,StartCoder 27B)with comparison and combination with Find Security Bugs.The evaluation method will involve using a selected dataset containing vulnerabilities,and the results to provide insights for different scenarios according to the software criticality(Business critical,non-critical,minimum effort,best effort)In detail,the main objectives of this study are to investigate if large language models outperform or exceed the capabilities of traditional static analysis tools,if the combining LLMs with Static Application Security Testing(SAST)tools lead to an improvement and the possibility that local machine learning models on a normal computer produce reliable results.Summarizing the most important conclusions of the research,it can be said that while it is true that the results have improved depending on the size of the LLM for business-critical software,the best results have been obtained by SAST analysis.This differs in“NonCritical,”“Best Effort,”and“Minimum Effort”scenarios,where the combination of LLM(Gemini)+SAST has obtained better results.
基金Guangzhou Science and Technology Program,Grant/Award Numbers:2025B03J0110,2024A03J1074,2024A03J0927。
文摘Large language models(LLMs)show considerable potential to revolutionize healthcare through their performance across diverse clinical applications.Given the inherent constraints of LLMs and the critical nature of medical practice,a rigorous and systematic evaluation of their medical competence is imperative.This study presents a comprehensive review of the established methodologies and benchmarks for evaluating the medical competence of LLMs,encompassing a thorough analysis of current assessment practices across medical knowledge,clinical practice competence,and ethical-safety considerations.By integrating clinician competency assessment frameworks into LLMs evaluation,we propose a structured tri-dimensional framework that systematically organizes existing evaluation approaches according to medical theoretical knowledge,clinical practice ability,and ethical-safety considerations.Furthermore,this research provides critical insights into future developmental trajectories while establishing foundational frameworks and standardization protocols for the integration of LLMs into medical practice.