The Pearl River Estuary(PRE)is one of China’s busiest shipping hubs and fishery production centers,as well as a region with abundant island tourism and wind energy resources,which calls for accurate short-term wind f...The Pearl River Estuary(PRE)is one of China’s busiest shipping hubs and fishery production centers,as well as a region with abundant island tourism and wind energy resources,which calls for accurate short-term wind forecasts.First,this study evaluated three operational numerical models,i.e.,ECMWF-EC,NCEP-GFS,and CMA-GD,for their ability to predict short-term wind speed over the PRE against in-situ observations during 2018-2021.Overall,ECMWF-EC out-performs other models with an average RMSE of 2.24 m s^(-1)and R of 0.57,but the NCEP-GFS performs better in the case of strong winds.Then,various bias correction and multi-model ensemble(MME)methods are used to perform the deterministic post-processing using a local and lead-specific scheme.Two-factor model output statistics(MOS2)is the optimal bias correction method for reducing(increasing)the overall RMSE(R)to 1.62(0.70)m s^(-1),demonstrating the benefits of considering both initial and lead-specific information.Intercomparison of MME results reveals that Multiple linear regression(MLR)presents superior skills,followed by random forest(RF),but it is slightly inferior to MOS2,particularly for the first few forecasting hours.Furthermore,the incorporation of additional features in MLR reduces the overall RMSE to 1.53 m s^(-1)and increases R to 0.74.Similarly,RF presents comparable results,and both outperform MOS2 in terms of correcting their deficiencies at the first few lead hours and limiting the error growth rate.Despite the satisfactory skill of deterministic post-processing techniques,they are unable to achieve a balanced performance between mean and extreme statistics.This highlights the necessity for further development of probabilistic forecasts.展开更多
The purpose of this paper is to explore the application of large language models(LLMs)in legal case retrieval and to evaluate their potential for providing legal professionals with more efficient work aids.Currently,a...The purpose of this paper is to explore the application of large language models(LLMs)in legal case retrieval and to evaluate their potential for providing legal professionals with more efficient work aids.Currently,although pre-trained models have made great progress in legal case retrieval,they are often limited to specific types of law(e.g.,criminal law,civil law,etc.)and lack the ability to generalize across different types of law.Moreover,most models can only deal with a single task,whereas the legal case retrieval task requires a model to have a superb comprehension of legal texts,involving multiple subtasks and requiring multitasking capabilities.Therefore,the large language model,which has super generalization and multitasking ability,can solve the above problems.In order to explore the application of large language models for legal case retrieval in the legal domain,this paper evaluates a series of emerging large language models,including multilingual models,homegrown large models,and models specifically designed for the legal domain.These models are used to retrieve legal cases and its associated subtasks.Based on the Supreme People’s Court definition,the legal case retrieval task is broken down into seven subtasks:event detection,fact generation,trigger word extraction,keyword extraction,summarization,dispute focus identification,and reasoning generation.Using a variety of evaluation metrics,the experiments demonstrated that these emerging models have significant potential in the field of legal case retrieval,even with few shot samples.The research in this paper not only introduces new ideas in the field of legal case retrieval,but also empirically verifies the potential of LLMs to improve the quality and efficiency of retrieval.It proves the value of large language models in this field and is expected to significantly enhance the efficiency of legal practitioners,as well as promote the consistency and fairness of legal judgments through the use of emerging technologies.展开更多
This study evaluates the 1995-2020 global ocean-sea ice simulation using the unstructured-mesh model for prediction across scales(MPAS)-ocean/sea ice model within energy exascale earth system model(E3SM)version 2.1(E3...This study evaluates the 1995-2020 global ocean-sea ice simulation using the unstructured-mesh model for prediction across scales(MPAS)-ocean/sea ice model within energy exascale earth system model(E3SM)version 2.1(E3SMv2-MPAS)at 60 km to 10 km resolution.Multi-source observational data are utilized to validate sea surface temperature/salinity,sea ice,three-dimensional thermal-saline structures,mixed layer depth,ocean heat content,and sea surface height.Key results show the following:(1)E3SMv2-MPAS captures seasonal-to-decadal variability in surface fields and sea ice,but shows systematic biases in sea surface temperature of western boundary currents(inadequate eddy parameterization)and Arctic sea surface salinity(misrepresented freshwater fluxes and mixing processes).(2)The model robustly represents three-dimensional climate variability,yet underestimates mixed layer depth in key regions(Antarctic Circumpolar Current and North Atlantic),revealing deficiencies in extreme mixing.(3)Ocean heat content distributions are well-simulated.(4)Sea surface height spatial patterns and interannual variability are accurately reproduced.This work identifies critical refinements for unstructured-mesh models:mesoscale eddy parameterization,polar ocean-sea ice coupling,and multi-scale energy processes,advancing high-resolution climate model development and laying the groundwork for improved ocean forecasting systems.展开更多
Background:Large language models(LLMs)have shown considerable promise in supporting clinical decision-making.However,their adoption and evaluation in dermatology remains limited.This study aimed to explore the prefere...Background:Large language models(LLMs)have shown considerable promise in supporting clinical decision-making.However,their adoption and evaluation in dermatology remains limited.This study aimed to explore the preferences of Chinese dermatologists regarding LLM-generated responses in clinical psoriasis scenarios and to assess how they prioritize key quality dimensions,including accuracy,traceability,and logicality.Methods:A cross-sectional,web-based survey was conducted between December 25,2024,and January 22,2025,following the Checklist for Reporting Results of Internet E-Surveys guidelines.A total of 1247 valid responses were collected from practicing dermatologists across 33 of China's provincial-level administrative divisions.Participants evaluated responses to five categories of clinical questions(etiology,clinical presentation,differential diagnosis,treatment,and case study)generated by five LLMs:ChatGPT-4o,Kimi.ai,Doubao,ZuoYiGPT,and Lingyi-agent.Statistical associations between participant characteristics and model preferences were examined using chi-square tests.Results:ChatGPT-4o(Model 1)emerged as the most preferred model across all clinical tasks,consistently receiving the highest number of votes in case study(n=740),clinical presentation(n=666),differential diagnosis(n=707),etiology(n=602),and treatment(n=656).Significant variation in model preference by professional title was observed only for the differential diagnosis task(χ^(2)=21.13,df=12,p=0.0485),while no significant differences were found across hospital tiers(p>0.05).In terms of evaluation dimensions,accuracy was most frequently rated as“very important”(n=635).A significant association existed between hospital tier and the most valued dimension(χ^(2)=27.667,df=9,p=0.0011),with dermatologists in primary hospitals prioritizing traceability more than their peers in higher-tier hospitals.No significant associations were found across professional titles(p=0.127).Conclusions:Chinese dermatologists suggest a strong preference for ChatGPT-4o over domestic LLMs in psoriasis-related clinical tasks.While accuracy remains the primary criterion,traceability and logicality are also critical,particularly for clinicians in lower-tier hospitals.These findings suggest that future clinical LLMs should prioritize not only content accuracy but also source transparency and structural clarity to meet the diverse needs of different clinical settings.展开更多
With the rapid development of generative artificial intelligence technologies,represented by large language models,university-level computer science education is undergoing a critical transition-from knowledge-based i...With the rapid development of generative artificial intelligence technologies,represented by large language models,university-level computer science education is undergoing a critical transition-from knowledge-based instruction to competency-oriented teaching.A postgraduate student competency evaluation model can serve as a framework to organize and guide both teaching and research activities at the postgraduate level.A number of relevant research efforts have already been conducted in this area.Graduate education plays a vital role not only as a continuation and enhancement of undergraduate education but also as essential preparation for future research endeavors.An analysis of the acceptance of competency evaluation models refers to the assessment of how various stakeholders perceive the importance of different components within the model.Investigating the degree of acceptance among diverse groups-such as current undergraduate students,current postgraduate students,graduates with less than three years of work experience,and those with more than three years of work experience-can offer valuable insights for improving and optimizing postgraduate education and training practices.展开更多
With the continuous development of the nursing discipline,standardized nurse training has always been a crucial link in the development of nursing science and plays an irreplaceable role in talent cultivation.However,...With the continuous development of the nursing discipline,standardized nurse training has always been a crucial link in the development of nursing science and plays an irreplaceable role in talent cultivation.However,in the current standardized training for some nurses,there are problems such as the simplification of nursing skill evaluation models and insufficient post competence of nurses.Therefore,optimizing the training model for nursing talents has become an inevitable measure.The problem-based learning(PBL)method and the Direct Observation of Procedural Skills(DOPS)evaluation model provide new directions and guidance for the development of training.Against this background,this paper explores effective approaches for standardized nurse training,starting from basic concepts and gradually delving into specific practical paths,aiming to improve the quality of talent cultivation and provide valuable references for other researchers.展开更多
Objective:To explore the application value of a new empowerment teaching method based on Kirkpatrick’s evaluation model in teaching Chinese medicine nursing in otorhinolaryngology.Methods:60 nurses who practiced in t...Objective:To explore the application value of a new empowerment teaching method based on Kirkpatrick’s evaluation model in teaching Chinese medicine nursing in otorhinolaryngology.Methods:60 nurses who practiced in the otolaryngology department of our hospital from June 2022 to October 2024 were included in the study and equally divided into two groups using a convenient sampling method.30 nurses who chose traditional Chinese medicine skill teaching management were included in the control group,and 30 nurses who chose the new empowerment teaching method based on Kirkpatrick’s evaluation model were included in the observation group.Relevant indicators such as clinical teaching environment perception,theoretical knowledge scores of Chinese medicine nursing,and excellent rate of practical operation assessment were compared.Results:The nurses in the observation group had higher scores for clinical teaching environment perception than the control group(P<0.05).However,the midterm and final exam scores for theoretical knowledge of Chinese medicine nursing were higher in the observation group than in the control group(P<0.05).Compared with the control group,the observation group had a higher excellent rate of practical operation assessment(93.33%>73.33%)and a higher Chinese medicine nursing ability score[(215.69±19.73)points>(184.87±15.66)points](P<0.05).Conclusion:Applying the new empowerment teaching method based on Kirkpatrick’s evaluation model to Chinese medicine nursing teaching in otolaryngology can help nurses understand the theoretical knowledge of Chinese medicine nursing and optimize the clinical teaching environment,thereby promoting their practical skills and Chinese medicine nursing abilities.展开更多
This paper proposes a multivariate data fusion based quality evaluation model for software talent cultivation.The model constructs a comprehensive ability and quality evaluation index system for college students from ...This paper proposes a multivariate data fusion based quality evaluation model for software talent cultivation.The model constructs a comprehensive ability and quality evaluation index system for college students from a perspective of engineering course,especially of software engineering.As for evaluation method,relying on the behavioral data of students during their school years,we aim to construct the evaluation model as objective as possible,effectively weakening the negative impact of personal subjective assumptions on the evaluation results.展开更多
High-resolution modeling approach is increasingly being considered as a necessary step for improving the monitoring and predictions of regional air quality. This is especially true for highly urbanized region with com...High-resolution modeling approach is increasingly being considered as a necessary step for improving the monitoring and predictions of regional air quality. This is especially true for highly urbanized region with complex terrain and land-use. This study uses Community Multiscale Air Quality (CMAQ) model coupled with MM5 mesoscale model for a comprehensive analysis to assess the suitability of such high-resolution modeling system in predicting ozone air quality in the complex terrains of Osaka, Japan. The 1-km and 3-kin grid domains were nested inside a 9-km domain and the domain with 1-km grid covered the Osaka region. High-resolution Grid Point Value-Mesoscale Model (GPV-MSM) data were used after suitable validation. The simulated ozone concentrations were validated and evaluated using statistical metrics using performance criteria set for ozone. Daily maxima of ozone were found better simulated by the 1-krn grid domain than the coarser 9-km and 3-km domains, with the maximum improvement in the mean absolute gross error about 3 ppbv. In addition, 1-km grid results fared better than other grids at most of the observation stations that showed noticeable differences in gross error as well as correlation. These results amply justify the use of the integrated high-resolution MM5-CMAQ modeling system in the highly urbanized region, such as the Osaka region, which has complex terrain and land-use.展开更多
BACKGROUND Gestational diabetes mellitus(GDM)is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which...BACKGROUND Gestational diabetes mellitus(GDM)is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which has emerged as a significant issue impacting the well-being of expectant mothers and their fetuses.Identifying and addressing GDM in a timely manner is crucial for maintaining the health of both expectant mothers and their developing fetuses.Therefore,this study aims to establish a risk prediction model for GDM and explore the effects of serum ferritin,blood glucose,and body mass index(BMI)on the occurrence of GDM.AIM To develop a risk prediction model to analyze factors leading to GDM,and evaluate its efficiency for early prevention.METHODS The clinical data of 406 pregnant women who underwent routine prenatal examination in Fujian Maternity and Child Health Hospital from April 2020 to December 2022 were retrospectively analyzed.According to whether GDM occurred,they were divided into two groups to analyze the related factors affecting GDM.Then,according to the weight of the relevant risk factors,the training set and the verification set were divided at a ratio of 7:3.Subsequently,a risk prediction model was established using logistic regression and random forest models,and the model was evaluated and verified.RESULTS Pre-pregnancy BMI,previous history of GDM or macrosomia,hypertension,hemoglobin(Hb)level,triglyceride level,family history of diabetes,serum ferritin,and fasting blood glucose levels during early pregnancy were determined.These factors were found to have a significant impact on the development of GDM(P<0.05).According to the nomogram model’s prediction of GDM in pregnancy,the area under the curve(AUC)was determined to be 0.883[95%confidence interval(CI):0.846-0.921],and the sensitivity and specificity were 74.1%and 87.6%,respectively.The top five variables in the random forest model for predicting the occurrence of GDM were serum ferritin,fasting blood glucose in early pregnancy,pre-pregnancy BMI,Hb level and triglyceride level.The random forest model achieved an AUC of 0.950(95%CI:0.927-0.973),the sensitivity was 84.8%,and the specificity was 91.4%.The Delong test showed that the AUC value of the random forest model was higher than that of the decision tree model(P<0.05).CONCLUSION The random forest model is superior to the nomogram model in predicting the risk of GDM.This method is helpful for early diagnosis and appropriate intervention of GDM.展开更多
The planetary boundary layer turbulence and moist convection parameterizations have been modified recently in the NASA Goddard Institute for Space Studies (GISS) Model E2 atmospheric general circulation model (GCM;...The planetary boundary layer turbulence and moist convection parameterizations have been modified recently in the NASA Goddard Institute for Space Studies (GISS) Model E2 atmospheric general circulation model (GCM; post-CMIP5, hereafter P5). In this study, single column model (SCM_P5) simulated cloud fractions (CFs), cloud liquid water paths (LWPs) and precipitation were compared with Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) groundbased observations made during the period 2002-08. CMIP5 SCM simulations and GCM outputs over the ARM SGP region were also used in the comparison to identify whether the causes of cloud and precipitation biases resulted from either the physical parameterization or the dynamic scheme. The comparison showed that the CMIP5 SCM has difficulties in simulating the vertical structure and seasonal variation of low-level clouds. The new scheme implemented in the turbulence parameterization led to significantly improved cloud simulations in P5. It was found that the SCM is sensitive to the relaxation time scale. When the relaxation time increased from 3 to 24 h, SCM_P5-simulated CFs and LWPs showed a moderate increase (10%-20%) but precipitation increased significantly (56%), which agreed better with observations despite the less accurate atmospheric state. Annual averages among the GCM and SCM simulations were almost the same, but their respective seasonal variations were out of phase. This suggests that the same physical cloud parameterization can generate similar statistical results over a long time period, but different dynamics drive the differences in seasonal variations. This study can potentially provide guidance for the further development of the GISS model.展开更多
The simulated Arctic sea ice drift and its relationship with the near-surface wind and surface ocean current during 1979-2014 in nine models from China that participated in the sixth phase of the Coupled Model Interco...The simulated Arctic sea ice drift and its relationship with the near-surface wind and surface ocean current during 1979-2014 in nine models from China that participated in the sixth phase of the Coupled Model Intercomparison Project(CMIP6)are examined by comparison with observational and reanalysis datasets.Most of the models reasonably represent the Beaufort Gyre(BG)and Transpolar Drift Stream(TDS)in the spatial patterns of their long-term mean sea ice drift,while the detailed location,extent,and strength of the BG and TDS vary among the models.About two-thirds of the models agree with the observation/reanalysis in the sense that the sea ice drift pattern is consistent with the near-surface wind pattern.About the same proportion of models shows that the sea ice drift pattern is consistent with the surface ocean current pattern.In the observation/reanalysis,however,the sea ice drift pattern does not match well with the surface ocean current pattern.All nine models missed the observational widespread sea ice drift speed acceleration across the Arctic.For the Arctic basin-wide spatial average,five of the nine models overestimate the Arctic long-term(1979-2014)mean sea ice drift speed in all months.Only FGOALS-g3 captures a significant sea ice drift speed increase from 1979 to 2014 both in spring and autumn.The increases are weaker than those in the observation.This evaluation helps assess the performance of the Arctic sea ice drift simulations in these CMIP6 models from China.展开更多
Background:Bladder cancer poses a great burden on society and its high rate of recurrence and treatment failure necessitates use of appropriate animal models to study its pathogenesis and test novel treatments.Orthoto...Background:Bladder cancer poses a great burden on society and its high rate of recurrence and treatment failure necessitates use of appropriate animal models to study its pathogenesis and test novel treatments.Orthotopic models are superior to other types since they provide a normal microenvironment.Four methods are described for developing bladder cancer models inside the animal’s bladder.Direct intramural injection is one of these methods and is widely used.However,its efficacy in model development has not yet been studied.We aimed to evaluate the efficacy and success rate of the direct intramural injection method of developing an orthotopic model for the study of bladder cancer.Method:Tumor cell lines were prepared in four microtubes.Aliquots of 200×10^(3) cells were injected through a 27 gauge needle into the ventral wall of the bladders of 4male and 4 female BALB/c mice following a midline 1 cm laparotomy incision.In addition,1 million cells from each microtube were injected into the flanks of control mice.To prevent infection and alleviate pain,5 mg/kg enrofloxacin and 2.5 mg/kg flunixin meglumine,respectively,were injected subcutaneously.Results:Tumors formed in all mice,resulting in 100% take rate and zero post-operation mortality.Surgery time was≤15 min per mouse.In two mice,tumors were found in the peritoneal space as well.Conclusion:Direct intramural injection is a rapid,reliable,and reproducible method for developing orthotopic models of bladder cancer.It can be done on both male and female mice and only requires readily available surgical tools.However,needle track can result in cell spillage and peritoneal tumors.展开更多
The Los Alamos Sea-Ice Model(CICE)is one of the most popular sea-ice models.All versions of it have been the main sea-ice module coupled to climate system models.Therefore,evaluating their simulation capability is an ...The Los Alamos Sea-Ice Model(CICE)is one of the most popular sea-ice models.All versions of it have been the main sea-ice module coupled to climate system models.Therefore,evaluating their simulation capability is an important step in developing climate system models.Compared with observations and previous versions(CICE4.0 and CICE5.0),the advantages of CICE6.0(the latest version)are analyzed in this paper.It is found that CICE6.0 has the minimum interannual errors,and the seasonal cycle it simulates is the most consistent with observations.CICE4.0 overestimates winter sea-ice and underestimates summer sea-ice severely.Meanwhile,the errors of CICE5.0 in winter are larger than for the other versions.The main attention is paid to the perennial ice and the seasonal ice.The spatial distribution of root-mean-square errors indicates that the simulated errors are distributed in the Atlantic sector and the outer Arctic.Both CICE4.0 and CICE5.0 underestimate the concentration of the perennial ice and overestimate that of the seasonal ice in these areas.Meanwhile,CICE6.0 solves this problem commendably.Moreover,the decadal trends it simulates are comparatively the best,especially in the central Arctic sea.The other versions underestimate the decadal trend of the perennial ice and overestimate that of the seasonal ice.In addition,an index used to objectively describe the difference in the spatial distribution between the simulation and observation shows that CICE6.0 produces the best simulated spatial distribution.展开更多
The climatological mean state, seasonal variation and long-term upward trend of 1979-2005 latent heat flux (LHF) in historical runs of 14 coupled general circulation models from CMIP5 (Coupled Model Intercomparison...The climatological mean state, seasonal variation and long-term upward trend of 1979-2005 latent heat flux (LHF) in historical runs of 14 coupled general circulation models from CMIP5 (Coupled Model Intercomparison Project Phase 5) are evaluated against OAFlux (Objectively Analyzed air-sea Fluxes) data. Inter-model diversity of these models in simulating the annual mean climatological LHF is discussed. Results show that the models can capture the climatological LHF fairly well, but the amplitudes are generally overestimated. Model-simulated seasonal variations of LHF match well with observations with overestimated amplitudes. The possible origins of these biases are wind speed biases in the CMIP5 models. Inter-model diversity analysis shows that the overall stronger or weaker LHF over the tropical and subtropical Pacific region, and the meridional variability of LHF, are the two most notable diversities of the CMIP5 models. Regression analysis indicates that the inter-model diversity may come from the diversity of simulated SST and near-surface atmospheric specific humidity. Comparing the observed long-term upward trend, the trends of LHF and wind speed are largely underestimated, while trends of SST and air specific humidity are grossly overestimated, which may be the origins of the model biases in reproducing the trend of LHF.展开更多
Background:An increasing number of ecological processes have been incorporated into Earth system models.However,model evaluations usually lag behind the fast development of models,leading to a pervasive simulation unc...Background:An increasing number of ecological processes have been incorporated into Earth system models.However,model evaluations usually lag behind the fast development of models,leading to a pervasive simulation uncertainty in key ecological processes,especially the terrestrial carbon(C)cycle.Traceability analysis provides a theoretical basis for tracking and quantifying the structural uncertainty of simulated C storage in models.Thus,a new tool of model evaluation based on the traceability analysis is urgently needed to efficiently diagnose the sources of inter-model variations on the terrestrial C cycle in Earth system models.Methods:A new cloud-based model evaluation platform,i.e.,the online traceability analysis system for model evaluation(TraceME v1.0),was established.The TraceME was applied to analyze the uncertainties of seven models from the Coupled Model Intercomparison Project(CMIP6).Results:The TraceME can effectively diagnose the key sources of different land C dynamics among CMIIP6 models.For example,the analyses based on TraceME showed that the estimation of global land C storage varied about 2.4 folds across the seven CMIP6 models.Among all models,IPSL-CM6A-LR simulated the lowest land C storage,which mainly resulted from its shortest baseline C residence time.Over the historical period of 1850–2014,gross primary productivity and baseline C residence time were the major uncertainty contributors to the inter-model variation in ecosystem C storage in most land grid cells.Conclusion:TraceME can facilitate model evaluation by identifying sources of model uncertainty and provides a new tool for the next generation of model evaluation.展开更多
The simulations of the Arctic Intermediate Water in four datasets of climate models and reanalyses, CCSM3, CCSM4, SODA and GLORYS, are analyzed and evaluated. The climatological core temperatures and depths in both CC...The simulations of the Arctic Intermediate Water in four datasets of climate models and reanalyses, CCSM3, CCSM4, SODA and GLORYS, are analyzed and evaluated. The climatological core temperatures and depths in both CCSM models exhibit deviations over 0.5°C and 200 m from the PHC. SODA reanalysis reproduces relatively reasonable spatial patterns of core temperature and depth, while GLORYS, another reanalysis, shows a remarkable cooling and deepening drift compared with the result at the beginning of the dataset especially in the Eurasian Basin (about 2°C). The heat contents at the depth of intermediate water in the CCSM models are overestimated with large positive errors nearly twice of that in the PHC. To the contrary, the GLORYS in 2009 show a negative error with a similar magnitude, which means the characteristic of the water mass is totally lost. The circulations in the two reanalyses at the depth of intermediate water are more energetic and realistic than those in the CCSMs, which is attributed to the horizontal eddy-permitting reso-lution. The velocity fields and the transports in the Fram Strait are also investigated. The necessity of finer horizontal resolution is concluded again. The northward volume transports are much larger in the two re-analyses, although they are still weak comparing with mooring observations. Finally, an investigation of the impact of assimilation is done with an evidence of the heat input from assimilation. It is thought to be a reason for the good performance in the SODA, while the GLORYS drifts dramatically without assimilation data in the Arctic Ocean.展开更多
The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decis...The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature.展开更多
[Objective] The aim was to study on RBF model about evaluation on carrying capacity of water resources based on standardized indices. [Method] The indices were transformed and the averages of standard values in differ...[Objective] The aim was to study on RBF model about evaluation on carrying capacity of water resources based on standardized indices. [Method] The indices were transformed and the averages of standard values in different levels were taken as the standardized values of components of central vectors for basic functions of RBF hidden nodes. Hence, the basic functions are suitable for most indices, simplifying expression and calculation of basic functions. [Result] RBF models concluded through Monkey-king Genetic Algorithm with weights optimization are used in evaluation on water carrying capacity in three districts in Changwu County in Shaanxi Province, which were in consistent with that through fuzzy evaluation. [Conclusion] RBF, simple and practical, is universal and popular.展开更多
The extreme rainfall forecast performances of both of ECMWF-IFS and ECMWF-EPS with MET version 5.1 were examined in landing Typhoon Soudelor(1513) with 60 h lead times. In this study the programs for converting ECMWF&...The extreme rainfall forecast performances of both of ECMWF-IFS and ECMWF-EPS with MET version 5.1 were examined in landing Typhoon Soudelor(1513) with 60 h lead times. In this study the programs for converting ECMWF's forecast data(both of ECMWF-IFS and ECMWF-EPS) format into that needed by MET were developed. Also, during landfall, the observed maximum 6-hour accumulated rainfall was investigated, and then the verification of extreme rainfall in Soudelor was carried out. Results showed that while traditional verification gives relatively low scores, by the method for object-based diagnostic evaluation(MODE) the significant rainy areas were well predicted in this case study.展开更多
基金Science and Technology Research Project of Guangdong Meteorological Service(GRMC2021M19,GRMC2022Q16,GRMC2023M29)。
文摘The Pearl River Estuary(PRE)is one of China’s busiest shipping hubs and fishery production centers,as well as a region with abundant island tourism and wind energy resources,which calls for accurate short-term wind forecasts.First,this study evaluated three operational numerical models,i.e.,ECMWF-EC,NCEP-GFS,and CMA-GD,for their ability to predict short-term wind speed over the PRE against in-situ observations during 2018-2021.Overall,ECMWF-EC out-performs other models with an average RMSE of 2.24 m s^(-1)and R of 0.57,but the NCEP-GFS performs better in the case of strong winds.Then,various bias correction and multi-model ensemble(MME)methods are used to perform the deterministic post-processing using a local and lead-specific scheme.Two-factor model output statistics(MOS2)is the optimal bias correction method for reducing(increasing)the overall RMSE(R)to 1.62(0.70)m s^(-1),demonstrating the benefits of considering both initial and lead-specific information.Intercomparison of MME results reveals that Multiple linear regression(MLR)presents superior skills,followed by random forest(RF),but it is slightly inferior to MOS2,particularly for the first few forecasting hours.Furthermore,the incorporation of additional features in MLR reduces the overall RMSE to 1.53 m s^(-1)and increases R to 0.74.Similarly,RF presents comparable results,and both outperform MOS2 in terms of correcting their deficiencies at the first few lead hours and limiting the error growth rate.Despite the satisfactory skill of deterministic post-processing techniques,they are unable to achieve a balanced performance between mean and extreme statistics.This highlights the necessity for further development of probabilistic forecasts.
基金supported by the Large-scale Industry Model Evaluation Capability Development(CXFZ2024004)the National Social Science Foundation of China(22ZD035)the Research Innovation Project Plan of China University of Political Science and Law(24KYGH021).
文摘The purpose of this paper is to explore the application of large language models(LLMs)in legal case retrieval and to evaluate their potential for providing legal professionals with more efficient work aids.Currently,although pre-trained models have made great progress in legal case retrieval,they are often limited to specific types of law(e.g.,criminal law,civil law,etc.)and lack the ability to generalize across different types of law.Moreover,most models can only deal with a single task,whereas the legal case retrieval task requires a model to have a superb comprehension of legal texts,involving multiple subtasks and requiring multitasking capabilities.Therefore,the large language model,which has super generalization and multitasking ability,can solve the above problems.In order to explore the application of large language models for legal case retrieval in the legal domain,this paper evaluates a series of emerging large language models,including multilingual models,homegrown large models,and models specifically designed for the legal domain.These models are used to retrieve legal cases and its associated subtasks.Based on the Supreme People’s Court definition,the legal case retrieval task is broken down into seven subtasks:event detection,fact generation,trigger word extraction,keyword extraction,summarization,dispute focus identification,and reasoning generation.Using a variety of evaluation metrics,the experiments demonstrated that these emerging models have significant potential in the field of legal case retrieval,even with few shot samples.The research in this paper not only introduces new ideas in the field of legal case retrieval,but also empirically verifies the potential of LLMs to improve the quality and efficiency of retrieval.It proves the value of large language models in this field and is expected to significantly enhance the efficiency of legal practitioners,as well as promote the consistency and fairness of legal judgments through the use of emerging technologies.
基金The National Key R&D Program of China under contract No.2021YFC3101503the Science and Technology Innovation Program of Hunan Province under contract No.2022RC3070+1 种基金the National Natural Science Foundation of China under contract Nos 42305176 and 42276205the Hunan Provincial Natural Science Foundation of China under contract No.2023JJ10053.
文摘This study evaluates the 1995-2020 global ocean-sea ice simulation using the unstructured-mesh model for prediction across scales(MPAS)-ocean/sea ice model within energy exascale earth system model(E3SM)version 2.1(E3SMv2-MPAS)at 60 km to 10 km resolution.Multi-source observational data are utilized to validate sea surface temperature/salinity,sea ice,three-dimensional thermal-saline structures,mixed layer depth,ocean heat content,and sea surface height.Key results show the following:(1)E3SMv2-MPAS captures seasonal-to-decadal variability in surface fields and sea ice,but shows systematic biases in sea surface temperature of western boundary currents(inadequate eddy parameterization)and Arctic sea surface salinity(misrepresented freshwater fluxes and mixing processes).(2)The model robustly represents three-dimensional climate variability,yet underestimates mixed layer depth in key regions(Antarctic Circumpolar Current and North Atlantic),revealing deficiencies in extreme mixing.(3)Ocean heat content distributions are well-simulated.(4)Sea surface height spatial patterns and interannual variability are accurately reproduced.This work identifies critical refinements for unstructured-mesh models:mesoscale eddy parameterization,polar ocean-sea ice coupling,and multi-scale energy processes,advancing high-resolution climate model development and laying the groundwork for improved ocean forecasting systems.
基金National Key Research and Development Program of China,Grant/Award Number:2024YFF0507404Special Clinical Business Fund for High-Level Hospitals of China-Japan Friendship Hospital,Grant/Award Number:2024-NHLHCRF-TS-01。
文摘Background:Large language models(LLMs)have shown considerable promise in supporting clinical decision-making.However,their adoption and evaluation in dermatology remains limited.This study aimed to explore the preferences of Chinese dermatologists regarding LLM-generated responses in clinical psoriasis scenarios and to assess how they prioritize key quality dimensions,including accuracy,traceability,and logicality.Methods:A cross-sectional,web-based survey was conducted between December 25,2024,and January 22,2025,following the Checklist for Reporting Results of Internet E-Surveys guidelines.A total of 1247 valid responses were collected from practicing dermatologists across 33 of China's provincial-level administrative divisions.Participants evaluated responses to five categories of clinical questions(etiology,clinical presentation,differential diagnosis,treatment,and case study)generated by five LLMs:ChatGPT-4o,Kimi.ai,Doubao,ZuoYiGPT,and Lingyi-agent.Statistical associations between participant characteristics and model preferences were examined using chi-square tests.Results:ChatGPT-4o(Model 1)emerged as the most preferred model across all clinical tasks,consistently receiving the highest number of votes in case study(n=740),clinical presentation(n=666),differential diagnosis(n=707),etiology(n=602),and treatment(n=656).Significant variation in model preference by professional title was observed only for the differential diagnosis task(χ^(2)=21.13,df=12,p=0.0485),while no significant differences were found across hospital tiers(p>0.05).In terms of evaluation dimensions,accuracy was most frequently rated as“very important”(n=635).A significant association existed between hospital tier and the most valued dimension(χ^(2)=27.667,df=9,p=0.0011),with dermatologists in primary hospitals prioritizing traceability more than their peers in higher-tier hospitals.No significant associations were found across professional titles(p=0.127).Conclusions:Chinese dermatologists suggest a strong preference for ChatGPT-4o over domestic LLMs in psoriasis-related clinical tasks.While accuracy remains the primary criterion,traceability and logicality are also critical,particularly for clinicians in lower-tier hospitals.These findings suggest that future clinical LLMs should prioritize not only content accuracy but also source transparency and structural clarity to meet the diverse needs of different clinical settings.
文摘With the rapid development of generative artificial intelligence technologies,represented by large language models,university-level computer science education is undergoing a critical transition-from knowledge-based instruction to competency-oriented teaching.A postgraduate student competency evaluation model can serve as a framework to organize and guide both teaching and research activities at the postgraduate level.A number of relevant research efforts have already been conducted in this area.Graduate education plays a vital role not only as a continuation and enhancement of undergraduate education but also as essential preparation for future research endeavors.An analysis of the acceptance of competency evaluation models refers to the assessment of how various stakeholders perceive the importance of different components within the model.Investigating the degree of acceptance among diverse groups-such as current undergraduate students,current postgraduate students,graduates with less than three years of work experience,and those with more than three years of work experience-can offer valuable insights for improving and optimizing postgraduate education and training practices.
文摘With the continuous development of the nursing discipline,standardized nurse training has always been a crucial link in the development of nursing science and plays an irreplaceable role in talent cultivation.However,in the current standardized training for some nurses,there are problems such as the simplification of nursing skill evaluation models and insufficient post competence of nurses.Therefore,optimizing the training model for nursing talents has become an inevitable measure.The problem-based learning(PBL)method and the Direct Observation of Procedural Skills(DOPS)evaluation model provide new directions and guidance for the development of training.Against this background,this paper explores effective approaches for standardized nurse training,starting from basic concepts and gradually delving into specific practical paths,aiming to improve the quality of talent cultivation and provide valuable references for other researchers.
文摘Objective:To explore the application value of a new empowerment teaching method based on Kirkpatrick’s evaluation model in teaching Chinese medicine nursing in otorhinolaryngology.Methods:60 nurses who practiced in the otolaryngology department of our hospital from June 2022 to October 2024 were included in the study and equally divided into two groups using a convenient sampling method.30 nurses who chose traditional Chinese medicine skill teaching management were included in the control group,and 30 nurses who chose the new empowerment teaching method based on Kirkpatrick’s evaluation model were included in the observation group.Relevant indicators such as clinical teaching environment perception,theoretical knowledge scores of Chinese medicine nursing,and excellent rate of practical operation assessment were compared.Results:The nurses in the observation group had higher scores for clinical teaching environment perception than the control group(P<0.05).However,the midterm and final exam scores for theoretical knowledge of Chinese medicine nursing were higher in the observation group than in the control group(P<0.05).Compared with the control group,the observation group had a higher excellent rate of practical operation assessment(93.33%>73.33%)and a higher Chinese medicine nursing ability score[(215.69±19.73)points>(184.87±15.66)points](P<0.05).Conclusion:Applying the new empowerment teaching method based on Kirkpatrick’s evaluation model to Chinese medicine nursing teaching in otolaryngology can help nurses understand the theoretical knowledge of Chinese medicine nursing and optimize the clinical teaching environment,thereby promoting their practical skills and Chinese medicine nursing abilities.
基金supported in part by the Education Reform Key Projects of Heilongjiang Province(Grant No.SJGZ20220011,SJGZ20220012)the Excellent Project of Ministry of Education and China Higher Education Association on Digital Ideological and Political Education in Universities(Grant No.GXSZSZJPXM001)。
文摘This paper proposes a multivariate data fusion based quality evaluation model for software talent cultivation.The model constructs a comprehensive ability and quality evaluation index system for college students from a perspective of engineering course,especially of software engineering.As for evaluation method,relying on the behavioral data of students during their school years,we aim to construct the evaluation model as objective as possible,effectively weakening the negative impact of personal subjective assumptions on the evaluation results.
文摘High-resolution modeling approach is increasingly being considered as a necessary step for improving the monitoring and predictions of regional air quality. This is especially true for highly urbanized region with complex terrain and land-use. This study uses Community Multiscale Air Quality (CMAQ) model coupled with MM5 mesoscale model for a comprehensive analysis to assess the suitability of such high-resolution modeling system in predicting ozone air quality in the complex terrains of Osaka, Japan. The 1-km and 3-kin grid domains were nested inside a 9-km domain and the domain with 1-km grid covered the Osaka region. High-resolution Grid Point Value-Mesoscale Model (GPV-MSM) data were used after suitable validation. The simulated ozone concentrations were validated and evaluated using statistical metrics using performance criteria set for ozone. Daily maxima of ozone were found better simulated by the 1-krn grid domain than the coarser 9-km and 3-km domains, with the maximum improvement in the mean absolute gross error about 3 ppbv. In addition, 1-km grid results fared better than other grids at most of the observation stations that showed noticeable differences in gross error as well as correlation. These results amply justify the use of the integrated high-resolution MM5-CMAQ modeling system in the highly urbanized region, such as the Osaka region, which has complex terrain and land-use.
文摘BACKGROUND Gestational diabetes mellitus(GDM)is a condition characterized by high blood sugar levels during pregnancy.The prevalence of GDM is on the rise globally,and this trend is particularly evident in China,which has emerged as a significant issue impacting the well-being of expectant mothers and their fetuses.Identifying and addressing GDM in a timely manner is crucial for maintaining the health of both expectant mothers and their developing fetuses.Therefore,this study aims to establish a risk prediction model for GDM and explore the effects of serum ferritin,blood glucose,and body mass index(BMI)on the occurrence of GDM.AIM To develop a risk prediction model to analyze factors leading to GDM,and evaluate its efficiency for early prevention.METHODS The clinical data of 406 pregnant women who underwent routine prenatal examination in Fujian Maternity and Child Health Hospital from April 2020 to December 2022 were retrospectively analyzed.According to whether GDM occurred,they were divided into two groups to analyze the related factors affecting GDM.Then,according to the weight of the relevant risk factors,the training set and the verification set were divided at a ratio of 7:3.Subsequently,a risk prediction model was established using logistic regression and random forest models,and the model was evaluated and verified.RESULTS Pre-pregnancy BMI,previous history of GDM or macrosomia,hypertension,hemoglobin(Hb)level,triglyceride level,family history of diabetes,serum ferritin,and fasting blood glucose levels during early pregnancy were determined.These factors were found to have a significant impact on the development of GDM(P<0.05).According to the nomogram model’s prediction of GDM in pregnancy,the area under the curve(AUC)was determined to be 0.883[95%confidence interval(CI):0.846-0.921],and the sensitivity and specificity were 74.1%and 87.6%,respectively.The top five variables in the random forest model for predicting the occurrence of GDM were serum ferritin,fasting blood glucose in early pregnancy,pre-pregnancy BMI,Hb level and triglyceride level.The random forest model achieved an AUC of 0.950(95%CI:0.927-0.973),the sensitivity was 84.8%,and the specificity was 91.4%.The Delong test showed that the AUC value of the random forest model was higher than that of the decision tree model(P<0.05).CONCLUSION The random forest model is superior to the nomogram model in predicting the risk of GDM.This method is helpful for early diagnosis and appropriate intervention of GDM.
基金supported by the DOE ASR program(Grant No.DESC008468)
文摘The planetary boundary layer turbulence and moist convection parameterizations have been modified recently in the NASA Goddard Institute for Space Studies (GISS) Model E2 atmospheric general circulation model (GCM; post-CMIP5, hereafter P5). In this study, single column model (SCM_P5) simulated cloud fractions (CFs), cloud liquid water paths (LWPs) and precipitation were compared with Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) groundbased observations made during the period 2002-08. CMIP5 SCM simulations and GCM outputs over the ARM SGP region were also used in the comparison to identify whether the causes of cloud and precipitation biases resulted from either the physical parameterization or the dynamic scheme. The comparison showed that the CMIP5 SCM has difficulties in simulating the vertical structure and seasonal variation of low-level clouds. The new scheme implemented in the turbulence parameterization led to significantly improved cloud simulations in P5. It was found that the SCM is sensitive to the relaxation time scale. When the relaxation time increased from 3 to 24 h, SCM_P5-simulated CFs and LWPs showed a moderate increase (10%-20%) but precipitation increased significantly (56%), which agreed better with observations despite the less accurate atmospheric state. Annual averages among the GCM and SCM simulations were almost the same, but their respective seasonal variations were out of phase. This suggests that the same physical cloud parameterization can generate similar statistical results over a long time period, but different dynamics drive the differences in seasonal variations. This study can potentially provide guidance for the further development of the GISS model.
基金supported by the National Key R&D Program of China(Grant No.2018YFA0605904)the National Natural Science Foundation of China(Grant No.41701411).
文摘The simulated Arctic sea ice drift and its relationship with the near-surface wind and surface ocean current during 1979-2014 in nine models from China that participated in the sixth phase of the Coupled Model Intercomparison Project(CMIP6)are examined by comparison with observational and reanalysis datasets.Most of the models reasonably represent the Beaufort Gyre(BG)and Transpolar Drift Stream(TDS)in the spatial patterns of their long-term mean sea ice drift,while the detailed location,extent,and strength of the BG and TDS vary among the models.About two-thirds of the models agree with the observation/reanalysis in the sense that the sea ice drift pattern is consistent with the near-surface wind pattern.About the same proportion of models shows that the sea ice drift pattern is consistent with the surface ocean current pattern.In the observation/reanalysis,however,the sea ice drift pattern does not match well with the surface ocean current pattern.All nine models missed the observational widespread sea ice drift speed acceleration across the Arctic.For the Arctic basin-wide spatial average,five of the nine models overestimate the Arctic long-term(1979-2014)mean sea ice drift speed in all months.Only FGOALS-g3 captures a significant sea ice drift speed increase from 1979 to 2014 both in spring and autumn.The increases are weaker than those in the observation.This evaluation helps assess the performance of the Arctic sea ice drift simulations in these CMIP6 models from China.
基金Tehran University of Medical Sciences and Health ServicesGrant/Award Number:98-3-101-45499。
文摘Background:Bladder cancer poses a great burden on society and its high rate of recurrence and treatment failure necessitates use of appropriate animal models to study its pathogenesis and test novel treatments.Orthotopic models are superior to other types since they provide a normal microenvironment.Four methods are described for developing bladder cancer models inside the animal’s bladder.Direct intramural injection is one of these methods and is widely used.However,its efficacy in model development has not yet been studied.We aimed to evaluate the efficacy and success rate of the direct intramural injection method of developing an orthotopic model for the study of bladder cancer.Method:Tumor cell lines were prepared in four microtubes.Aliquots of 200×10^(3) cells were injected through a 27 gauge needle into the ventral wall of the bladders of 4male and 4 female BALB/c mice following a midline 1 cm laparotomy incision.In addition,1 million cells from each microtube were injected into the flanks of control mice.To prevent infection and alleviate pain,5 mg/kg enrofloxacin and 2.5 mg/kg flunixin meglumine,respectively,were injected subcutaneously.Results:Tumors formed in all mice,resulting in 100% take rate and zero post-operation mortality.Surgery time was≤15 min per mouse.In two mice,tumors were found in the peritoneal space as well.Conclusion:Direct intramural injection is a rapid,reliable,and reproducible method for developing orthotopic models of bladder cancer.It can be done on both male and female mice and only requires readily available surgical tools.However,needle track can result in cell spillage and peritoneal tumors.
基金This research is supported jointly by the National Key R&D Program of China[grant numbers 2016YFA0602100 and 2018YFC1407104]the china Special Fund for Meteorological Research in the Public Interest[grant number GYHY201506011]the National Natural Science Foundation of China[grant number 41975134].
文摘The Los Alamos Sea-Ice Model(CICE)is one of the most popular sea-ice models.All versions of it have been the main sea-ice module coupled to climate system models.Therefore,evaluating their simulation capability is an important step in developing climate system models.Compared with observations and previous versions(CICE4.0 and CICE5.0),the advantages of CICE6.0(the latest version)are analyzed in this paper.It is found that CICE6.0 has the minimum interannual errors,and the seasonal cycle it simulates is the most consistent with observations.CICE4.0 overestimates winter sea-ice and underestimates summer sea-ice severely.Meanwhile,the errors of CICE5.0 in winter are larger than for the other versions.The main attention is paid to the perennial ice and the seasonal ice.The spatial distribution of root-mean-square errors indicates that the simulated errors are distributed in the Atlantic sector and the outer Arctic.Both CICE4.0 and CICE5.0 underestimate the concentration of the perennial ice and overestimate that of the seasonal ice in these areas.Meanwhile,CICE6.0 solves this problem commendably.Moreover,the decadal trends it simulates are comparatively the best,especially in the central Arctic sea.The other versions underestimate the decadal trend of the perennial ice and overestimate that of the seasonal ice.In addition,an index used to objectively describe the difference in the spatial distribution between the simulation and observation shows that CICE6.0 produces the best simulated spatial distribution.
基金supported by the National Basic Research Program of China(Grant No.2012CB417403)the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA05090402)the Opening Project of Key Laboratory of Meteorological Disaster of Ministry of Education of Nanjing University of Information Science and Technology(Grant No.KLME1401)
文摘The climatological mean state, seasonal variation and long-term upward trend of 1979-2005 latent heat flux (LHF) in historical runs of 14 coupled general circulation models from CMIP5 (Coupled Model Intercomparison Project Phase 5) are evaluated against OAFlux (Objectively Analyzed air-sea Fluxes) data. Inter-model diversity of these models in simulating the annual mean climatological LHF is discussed. Results show that the models can capture the climatological LHF fairly well, but the amplitudes are generally overestimated. Model-simulated seasonal variations of LHF match well with observations with overestimated amplitudes. The possible origins of these biases are wind speed biases in the CMIP5 models. Inter-model diversity analysis shows that the overall stronger or weaker LHF over the tropical and subtropical Pacific region, and the meridional variability of LHF, are the two most notable diversities of the CMIP5 models. Regression analysis indicates that the inter-model diversity may come from the diversity of simulated SST and near-surface atmospheric specific humidity. Comparing the observed long-term upward trend, the trends of LHF and wind speed are largely underestimated, while trends of SST and air specific humidity are grossly overestimated, which may be the origins of the model biases in reproducing the trend of LHF.
基金supported by the National Key R&D Program of China(2017YFA0604600)National Natural Science Foundation of China(31722009).
文摘Background:An increasing number of ecological processes have been incorporated into Earth system models.However,model evaluations usually lag behind the fast development of models,leading to a pervasive simulation uncertainty in key ecological processes,especially the terrestrial carbon(C)cycle.Traceability analysis provides a theoretical basis for tracking and quantifying the structural uncertainty of simulated C storage in models.Thus,a new tool of model evaluation based on the traceability analysis is urgently needed to efficiently diagnose the sources of inter-model variations on the terrestrial C cycle in Earth system models.Methods:A new cloud-based model evaluation platform,i.e.,the online traceability analysis system for model evaluation(TraceME v1.0),was established.The TraceME was applied to analyze the uncertainties of seven models from the Coupled Model Intercomparison Project(CMIP6).Results:The TraceME can effectively diagnose the key sources of different land C dynamics among CMIIP6 models.For example,the analyses based on TraceME showed that the estimation of global land C storage varied about 2.4 folds across the seven CMIP6 models.Among all models,IPSL-CM6A-LR simulated the lowest land C storage,which mainly resulted from its shortest baseline C residence time.Over the historical period of 1850–2014,gross primary productivity and baseline C residence time were the major uncertainty contributors to the inter-model variation in ecosystem C storage in most land grid cells.Conclusion:TraceME can facilitate model evaluation by identifying sources of model uncertainty and provides a new tool for the next generation of model evaluation.
基金The National Basic Research Program(973 Program)of China under contract No.2013CBA01805the National Natural Science Foundation of China under contract No.41330960the Plan 111 of Ocean University of China under contract B07036
文摘The simulations of the Arctic Intermediate Water in four datasets of climate models and reanalyses, CCSM3, CCSM4, SODA and GLORYS, are analyzed and evaluated. The climatological core temperatures and depths in both CCSM models exhibit deviations over 0.5°C and 200 m from the PHC. SODA reanalysis reproduces relatively reasonable spatial patterns of core temperature and depth, while GLORYS, another reanalysis, shows a remarkable cooling and deepening drift compared with the result at the beginning of the dataset especially in the Eurasian Basin (about 2°C). The heat contents at the depth of intermediate water in the CCSM models are overestimated with large positive errors nearly twice of that in the PHC. To the contrary, the GLORYS in 2009 show a negative error with a similar magnitude, which means the characteristic of the water mass is totally lost. The circulations in the two reanalyses at the depth of intermediate water are more energetic and realistic than those in the CCSMs, which is attributed to the horizontal eddy-permitting reso-lution. The velocity fields and the transports in the Fram Strait are also investigated. The necessity of finer horizontal resolution is concluded again. The northward volume transports are much larger in the two re-analyses, although they are still weak comparing with mooring observations. Finally, an investigation of the impact of assimilation is done with an evidence of the heat input from assimilation. It is thought to be a reason for the good performance in the SODA, while the GLORYS drifts dramatically without assimilation data in the Arctic Ocean.
文摘The emergence of Medical Large Language Models has significantly transformed healthcare.Medical Large Language Models(Med-LLMs)serve as transformative tools that enhance clinical practice through applications in decision support,documentation,and diagnostics.This evaluation examines the performance of leading Med-LLMs,including GPT-4Med,Med-PaLM,MEDITRON,PubMedGPT,and MedAlpaca,across diverse medical datasets.It provides graphical comparisons of their effectiveness in distinct healthcare domains.The study introduces a domain-specific categorization system that aligns these models with optimal applications in clinical decision-making,documentation,drug discovery,research,patient interaction,and public health.The paper addresses deployment challenges of Medical-LLMs,emphasizing trustworthiness and explainability as essential requirements for healthcare AI.It presents current evaluation techniques that improve model transparency in high-stakes medical contexts and analyzes regulatory frameworks using benchmarking datasets such asMedQA,MedMCQA,PubMedQA,and MIMIC.By identifying ongoing challenges in biasmitigation,reliability,and ethical compliance,thiswork serves as a resource for selecting appropriate Med-LLMs and outlines future directions in the field.This analysis offers a roadmap for developing Med-LLMs that balance technological innovation with the trust and transparency required for clinical integration,a perspective often overlooked in existing literature.
基金Supported by National Natural Science Foundation of China (51179110)~~
文摘[Objective] The aim was to study on RBF model about evaluation on carrying capacity of water resources based on standardized indices. [Method] The indices were transformed and the averages of standard values in different levels were taken as the standardized values of components of central vectors for basic functions of RBF hidden nodes. Hence, the basic functions are suitable for most indices, simplifying expression and calculation of basic functions. [Result] RBF models concluded through Monkey-king Genetic Algorithm with weights optimization are used in evaluation on water carrying capacity in three districts in Changwu County in Shaanxi Province, which were in consistent with that through fuzzy evaluation. [Conclusion] RBF, simple and practical, is universal and popular.
基金supported by STI/CMA through the National Basic Research Program of China (2015CB452806)Projects for Public Welfare (Meteorology) of China (GYHY201506007)WMO-TLFDP and UNESCAP/WMO Typhoon Committee Research Fellowship
文摘The extreme rainfall forecast performances of both of ECMWF-IFS and ECMWF-EPS with MET version 5.1 were examined in landing Typhoon Soudelor(1513) with 60 h lead times. In this study the programs for converting ECMWF's forecast data(both of ECMWF-IFS and ECMWF-EPS) format into that needed by MET were developed. Also, during landfall, the observed maximum 6-hour accumulated rainfall was investigated, and then the verification of extreme rainfall in Soudelor was carried out. Results showed that while traditional verification gives relatively low scores, by the method for object-based diagnostic evaluation(MODE) the significant rainy areas were well predicted in this case study.