In this paper,we establish some strong laws of large numbers,which are for nonindependent random variables under the framework of sublinear expectations.One of our main results is for blockwise m-dependent random vari...In this paper,we establish some strong laws of large numbers,which are for nonindependent random variables under the framework of sublinear expectations.One of our main results is for blockwise m-dependent random variables,and another is for sub-orthogonal random variables.Both extend the strong law of large numbers for independent random variables under sublinear expectations to the non-independent case.展开更多
Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLM...Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.展开更多
To the Editor,Artificial intelligence(AI)usage has been increasing.Many fields have implemented the use of AI and Large LanguageModels(LLMs),especially in medicine.Furthermore,manypatients have increasingly been using...To the Editor,Artificial intelligence(AI)usage has been increasing.Many fields have implemented the use of AI and Large LanguageModels(LLMs),especially in medicine.Furthermore,manypatients have increasingly been using AI;often,they will prompt AI with questions before even stepping into a physi-cian's office.The question lies in whether the information produced by AI is reliable and if this information is concise and easy to read across all patient populations.展开更多
This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing...This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing patient queries about disease and lifestyle behaviors.The models selected were ChatGPT-4o,Gemini 2.0 Pro,Claude 3.5 Sonnet,and DeepSeek V3,with 12 questions chosen by two HCV experts from the domains of prevention,diagnosis,and treatment.展开更多
It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problemat...It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problematic,since there is a need to differentiate between these two scenarios.Until recently,the proper−semantic−causality of the relationship could have been determined only by human experts from the area of expertise of the studied data.This has changed with the advance of large language models,which are often utilized as surrogates for such human experts,making the process automated and readily available to all data analysts.This motivates the main objective of this work,which is to introduce the design and implementation of a large language model-based semantic causality evaluator based on correlation analysis,together with its visual analysis model called Causal heatmap.After the implementation itself,the model is evaluated from the point of view of the quality of the visual model,from the point of view of the quality of causal evaluation based on large language models,and from the point of view of comparative analysis,while the results reached in the study highlight the usability of large language models in the task and the potential of the proposed approach in the analysis of unknown datasets.The results of the experimental evaluation demonstrate the usefulness of the Causal heatmap method,supported by the evident highlighting of interesting relationships,while suppressing irrelevant ones.展开更多
Electrochromic smart windows(ESWs)can significantly reduce building energy consumption,but the high cost hinders large-scale production.The in situ growth of tungsten oxide(WO_(3))films is only by a simple immersion p...Electrochromic smart windows(ESWs)can significantly reduce building energy consumption,but the high cost hinders large-scale production.The in situ growth of tungsten oxide(WO_(3))films is only by a simple immersion process,the silver nanowires(AgNWs)undergo oxidation to Ag^(+)ions through electron loss,and the liberated electrons provide driving force for the deposition of WO_(4)^(2-).Enabled the fabrication of large-area WO_(3)films and ESWs were fabricated under minimal laboratory conditions,demonstrating the economic feasibility,efficient and reliable nature of industrial production.Structural characterization and density functional theory calculations were combined to confirm that AgNWs effectively regulate oxygen vacancies of WO_(3)films and promote the in situ growth process.The optimized WO_(3)exhibits a maximum transmittance modulation of 90.8%and excellent cycling stability of 20,000 cycles.The largescale WO_(3)-based ESWs can save building energy up to 140.0 MJ m^(-2)compared to traditional windows in tropical regions,as verified by simulations more than40 global cities.This research provides a new approach for improving the performance and industrial production of ESW,providing the full understanding and development direction to short the distance of the ESW commercial production.展开更多
Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Ja...Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Jabbour et al.2017,Wong et al.2018,and Wu et al.2021)were replicated using ChatGPTv3.5 and Bard.Outputs(author,title,publication year,and journal)were compared to the original references and cross-referenced with medical databases for authenticity and recall.Results:Several themes emerged when comparing Bard and ChatGPT across the three reviews.Bard generated more outputs and had greater recall in Wong et al.'s review,with a broader date range in Jabbour et al.'s review.In Wu et al.'s review,ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.Conclusion:Large language models(LLMs)failed to fully replicate peer-reviewed methodologies,producing outputs with inaccuracies but identifying relevant,especially recent,articles missed by the references.While human-led PRISMA-based reviews remain the gold standard,refining LLMs for literature reviews shows potential.展开更多
This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to use...This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.展开更多
This study investigates the turbulence-induced disturbances and stall precursor triggering mechanism in NACA65-18(10)cascade based on large eddy simulations.The results indicate that the disturbances exist under vario...This study investigates the turbulence-induced disturbances and stall precursor triggering mechanism in NACA65-18(10)cascade based on large eddy simulations.The results indicate that the disturbances exist under various operating conditions along the performance curve.The shear layer is the physical structure responsible for the generation,propagation,and dissipation of disturbances.When operating near stall,the separation on the suction surface intensifies,and strong unsteady backflow occurs at the trailing edge of the passage.Under the influence of inlet disturbances,unsteady behaviors between passages form specific phase differences,leading the entire system to oscillate in a first-order mode.As the flow develops from near-stall to stall,axial momentum decreases further,reducing the main flow’s ability to drive blockages downstream through convection.Consequently,the blockage accumulates during the circumferential propagation process until the stall onset.Based on the above mechanism,this study proposes factors describing the size of the backflow zone,shedding frequency,and convection velocity to characterize blockage dynamics,identifying critical values that represent the stall onset.展开更多
The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streaml...The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streamlined in different domains.The offered study introduces an immediate optimization outline,named PROMPTx-PE,that is going to yield a greater level of precision and strength when it comes to the assignments that are premised on LLM.The proposed systemfeatures a timely selection schemewhich is informed by reinforcement learning,a contextual layer and a dynamic weighting module which is regulated by Lyapunov-based stability guidelines.The PROMPTx-PE dynamically varies the exploration and exploitation of the prompt space,depending on real-time feedback and multi-objective reward development.Extensive testing on both benchmark(GLUE,SuperGLUE)and domain-specific data(Healthcare-QA and Industrial-NER)demonstrates a large best performance to be 89.4%and a strong robustness disconnect with under 3%computation expense.The results confirm the effectiveness,consistency,and scalability of PROMPTx-PE as a platform of adaptive prompt engineering based on recent uses of LLMs.展开更多
Impacted upper ureteral stones are definedas calculi that remain lodged in the same location within the upper ureter for more than two months,1 and they are typically associated with inflammation,mucosal edema,and fib...Impacted upper ureteral stones are definedas calculi that remain lodged in the same location within the upper ureter for more than two months,1 and they are typically associated with inflammation,mucosal edema,and fibrosisof the surrounding ureteral wall.These stones often lead to significantclinical consequences,including persistent flankpain,hydronephrosis,infection,impaired renal function,and in severe cases,irreversible kidney damage.展开更多
The integration of large-scale foundation models(e.g.,GPT series and AlphaFold)into oncology is fundamentally transforming both research methodologies and clinical practices,driven by unprecedented advancements in com...The integration of large-scale foundation models(e.g.,GPT series and AlphaFold)into oncology is fundamentally transforming both research methodologies and clinical practices,driven by unprecedented advancements in computational power.This review synthesizes recent progress in the application of large language models to core oncological tasks,including medical imaging analysis,genomic interpretation,and personalized treatment planning.Underpinned by advanced computational infrastructures,such as graphics processing unit/tensor processing unit clusters,heterogeneous computing,and cloud platforms,these models enable superior representation learning and generalization across multimodal data sources.This review examines how these infrastructures overcome key bottlenecks in intelligent oncology through scalable optimization strategies,including mixed-precision training,memory optimization,and heterogeneous computing.Alongside these technical advancements,the review explores pressing challenges,such as data heterogeneity,limited model interpretability,regulatory uncertainties,and the environmental impact of artificial intelligence(AI)systems.Special emphasis is placed on emerging solutions,encompassing green AI and edge computing,which offer promising approaches for low-resource deployment scenarios.Additionally,the review highlights the critical role of interdisciplinary collaboration among oncology,computer science,ethics,and policy to ensure that AI systems are not only powerful but also transparent,safe,and clinically relevant.Finally,the review outlines potential avenues for future research aimed at developing robust,scalable,and human-centered frameworks for intelligent oncology.展开更多
Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a fr...Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a framework for evaluating LLMs and physician decisions in challenging lung cancer cases.Methods:We curated 50 challenging lung cancer cases(25 local and 25 published)classified as complex,rare,or refractory.Blinded three-dimensional,five-point Likert evaluations(1–5 for comprehensiveness,specificity,and readability)compared standalone LLMs(DeepSeek R1,Claude 3.5,Gemini 1.5,and GPT-4o),physicians by experience level(junior,intermediate,and senior),and AI-assisted juniors;intergroup differences and augmentation effects were analyzed statistically.Results:Of 50 challenging cases(18 complex,17 rare,and 15 refractory)rated by three experts,DeepSeek R1 achieved scores of 3.95±0.33,3.71±0.53,and 4.26±0.18 for comprehensiveness,specificity,and readability,respectively,positioning it between intermediate(3.68,3.68,3.75)and senior(4.50,4.64,4.53)physicians.GPT-4o and Claude 3.5 reached intermediate physician–level comprehensiveness(3.76±0.39,3.60±0.39)but junior-to-intermediate physician–level specificity(3.39±0.39,3.39±0.49).All LLMs scored higher on rare cases than intermediate physicians but fell below junior physicians in refractory-case specificity.AIassisted junior physicians showed marked gains in rare cases,with comprehensiveness rising from 2.32 to 4.29(84.8%),specificity from 2.24 to 4.26(90.8%),and readability from 2.76 to 4.59(66.0%),while specificity declined by 3.2%(3.17 to 3.07)in refractory cases.Error analysis showed complementary strengths,with physicians demonstrating reasoning stability and LLMs excelling in knowledge updating and risk management.Conclusions:LLMs performed variably in clinical decision-making tasks depending on case type,performing better in rare cases and worse in refractory cases requiring longitudinal reasoning.Complementary strengths between LLMs and physicians support case-and task-tailored human–AI collaboration.展开更多
Polyfluoroalkyl substances(PFAS)have emerged as persistent environmental contaminants because of their chemical stability,degradation-resistance and bioaccumulation potential.However,current studies mainly focus on th...Polyfluoroalkyl substances(PFAS)have emerged as persistent environmental contaminants because of their chemical stability,degradation-resistance and bioaccumulation potential.However,current studies mainly focus on the toxicity of single PFAS such as perfluorooctanoic acid(PFOA)and perfluorobutanoic acid(PFBA),the knowledge of their combined effects is relatively limited.In this study,we explored the immune response of the gut in large yellow croaker(Larimichthys crocea)under the combined stress of PFOA and PFBA.Histologicalanalyses revealed that the combined effect induced intestinal vacuolization and decreased the length of intestinal villi.And it significantly activated pro-inflammatory pathways with marked upregulation of tnfα,il1β,il6 and myd88 expressions,particularly after 14 days of exposure.Gut microbiota analysis revealed substantial dysbiosis,including 1)reduced alpha diversity,2)increased abundance of potential pathogenic taxa(Proteobacteria and Spirochaetota),and 3)depletion of beneficial Firmicutes.PICRUSt-based functional prediction indicated temporal metabolic shifts,with upregulation of DNA repair pathways at day 3 and enhanced bacterial motility protein activity at days 7 and 14 of post-exposure.The Pearson correlation analysis further indicated that these immune genes had significant positive correlations with Vibrio and Brevinema,and negative correlations with Streptococcus.Our present study will provide novel insights into the microbiome-mediated immunomodulation in the larger yellow croaker exposed to combined PFAS,which will be helpful for healthy farming of economically important marine species.展开更多
This study presents an implicit multiphysics coupling method integrating Computational Fluid Dynamics(CFD),the Multiphase Particle-in-Cell(MPPIC)model,and the Finite Element Method(FEM),implemented with OpenFOAM,Calcu...This study presents an implicit multiphysics coupling method integrating Computational Fluid Dynamics(CFD),the Multiphase Particle-in-Cell(MPPIC)model,and the Finite Element Method(FEM),implemented with OpenFOAM,CalculiX,and preCICE to simulate fluid-particle-structure interactions with large deformations.Mesh motion in the fluid field is handled using the radial basis function(RBF)method.The particle phase is modeled by MPPIC,where fluid-particle interaction is described through momentum exchange,and inter-particle collisions are characterized by collision stress.The structural field is solved by nonlinear FEM to capture large deformations induced by geometric nonlinearity.Coupling among fields is realized through a partitioned,parallel,and non-intrusive iterative strategy,ensuring stable transfer and convergence of interface forces and displacements.Notably,the influence of particles on the structure is not direct but mediated by the fluid,while structural motion directly affects particle dynamics.The results demonstrate that the proposed approach effectively captures multiphysics interaction processes and provides a valuable reference for numerical modeling of coupled fluid-particle-structure systems.展开更多
The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in S...The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in Spanish is challenging due to linguistic complexity and the scarcity of annotated resources.In this paper,we compare two predominant AI-based approaches for the forensic detection of malicious hate speech:(1)finetuning encoder-only models that have been trained in Spanish and(2)In-Context Learning techniques(Zero-and Few-Shot Learning)with large-scale language models.Our approach goes beyond binary classification,proposing a comprehensive,multidimensional evaluation that labels each text by:(1)type of speech,(2)recipient,(3)level of intensity(ordinal)and(4)targeted group(multi-label).Performance is evaluated using an annotated Spanish corpus,standard metrics such as precision,recall and F1-score and stability-oriented metrics to evaluate the stability of the transition from zero-shot to few-shot prompting(Zero-to-Few Shot Retention and Zero-to-Few Shot Gain)are applied.The results indicate that fine-tuned encoder-only models(notably MarIA and BETO variants)consistently deliver the strongest and most reliable performance:in our experiments their macro F1-scores lie roughly in the range of approximately 46%–66%depending on the task.Zero-shot approaches are much less stable and typically yield substantially lower performance(observed F1-scores range approximately 0%–39%),often producing invalid outputs in practice.Few-shot prompting(e.g.,Qwen 38B,Mistral 7B)generally improves stability and recall relative to pure zero-shot,bringing F1-scores into a moderate range of approximately 20%–51%but still falling short of fully fine-tuned models.These findings highlight the importance of supervised adaptation and discuss the potential of both paradigms as components in AI-powered cybersecurity and malware forensics systems designed to identify and mitigate coordinated online hate campaigns.展开更多
LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora...LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora,which introduces significant bias in performance evaluation;and(2)mainstream metrics focus primarily on surface-level accuracy,failing to uncover the underlying factors that constrain model capabilities.To address these issues,this paper presents TCode(Translation-Oriented Code Evaluation benchmark)—a complexity-controllable,contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework.The dataset is carefully designed to control complexity along multiple dimensions—including syntactic nesting and expression intricacy—enabling both broad coverage and fine-grained differentiation of sample difficulty.This design supports precise evaluation of model capabilities across a wide spectrum of translation challenges.The proposed evaluation framework introduces a correlation-driven analysis mechanism based on static program features,enabling predictive modeling of translation success from two perspectives:Code Form Complexity(e.g.,code length and character density)and Semantic Modeling Complexity(e.g.,syntactic depth,control-flow nesting,and type system complexity).Empirical evaluations across representative LLMs—including Qwen2.5-72B and Llama3.3-70B—demonstrate that even state-of-the-art models achieve over 80% compilation success on simple samples,but their accuracy drops sharply below 40% on complex cases.Further correlation analysis indicates that Semantic Modeling Complexity alone is correlated with up to 60% of the variance in translation success,with static program features exhibiting nonlinear threshold effects that highlight clear capability boundaries.This study departs fromthe traditional accuracy-centric evaluation paradigm and,for the first time,systematically characterizes the capabilities of large languagemodels in translation tasks through the lens of programstatic features.The findings provide actionable insights for model refinement and training strategy development.展开更多
Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that man...Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that manipulate model behavior through malicious instructions.Following Kitchenham’s guidelines,this systematic review synthesizes 128 peer-reviewed studies from 2022 to 2025 to provide a unified understanding of this rapidly evolving threat landscape.Our findings reveal a swift progression from simple direct injections to sophisticated multimodal attacks,achieving over 90%success rates against unprotected systems.In response,defense mechanisms show varying effectiveness:input preprocessing achieves 60%–80%detection rates and advanced architectural defenses demonstrate up to 95%protection against known patterns,though significant gaps persist against novel attack vectors.We identified 37 distinct defense approaches across three categories,but standardized evaluation frameworks remain limited.Our analysis attributes these vulnerabilities to fundamental LLM architectural limitations,such as the inability to distinguish instructions from data and attention mechanism vulnerabilities.This highlights critical research directions such as formal verification methods,standardized evaluation protocols,and architectural innovations for inherently secure LLM designs.展开更多
Accurate forecasting of tropical cyclone(TC)tracks and intensities is essential.Although the TianXing large weather model,a six-hourly forecasting model surpassing operational forecasts,exhibits superior performance,i...Accurate forecasting of tropical cyclone(TC)tracks and intensities is essential.Although the TianXing large weather model,a six-hourly forecasting model surpassing operational forecasts,exhibits superior performance,its TC forecasts still require enhancement.Prediction errors persist due to biases in the training data and smoothing effects in data-driven methods.To address this,we introduce CycloneBCNet,a deep-learning model designed to correct TianXing’s TC forecast biases by leveraging spatial and temporal data.CycloneBCNet utilizes the SimVP(simpler yet better video prediction)framework with spatial attention to highlight cyclone core regions in forecast fields.It also incorporates TC trend information(center position,maximum wind speed,and minimum sea level pressure)via an LSTM(long short-term memory)module.These TC vectors are derived from post-processed TianXing forecasts.By fusing features from forecast fields and TC vectors,CycloneBCNet corrects biases across multiple lead times.At a 96-h lead time,the track error reduces from 162.4 to 86.4 km,the wind speed error from 17.2 to 6.69 m s^(-1),and the pressure error from 22.2 to 9.36 hPa.Interpretability analysis shows that CycloneBCNet adjusts its attention across forecast lead times.Intensity corrections prioritize inner-core dynamics,particularly the eye and eyewall,while track corrections shift from lower-level variables and the cyclone’s core to broader environmental factors and mid-to upper-level features as the forecast duration increases.These findings demonstrate that CycloneBCNet effectively captures key TC dynamics consistent with meteorological principles,including the dominance of near-surface conditions for intensity and the increasing influence of steering currents on track prediction.展开更多
Objective To develop a clinical decision and prescription generation system(CDPGS)specifically for diarrhea in traditional Chinese medicine(TCM),utilizing a specialized large language model(LLM),Qwen-TCM-Dia,to standa...Objective To develop a clinical decision and prescription generation system(CDPGS)specifically for diarrhea in traditional Chinese medicine(TCM),utilizing a specialized large language model(LLM),Qwen-TCM-Dia,to standardize diagnostic processes and prescription generation.Methods Two primary datasets were constructed:an evaluation benchmark and a fine-tuning dataset consisting of fundamental diarrhea knowledge,medical records,and chain-ofthought(CoT)reasoning datasets.After an initial evaluation of 16 open-source LLMs across inference time,accuracy,and output quality,Qwen2.5 was selected as the base model due to its superior overall performance.We then employed a two-stage low-rank adaptation(LoRA)fine-tuning strategy,integrating continued pre-training on domain-specific knowledge with instruction fine-tuning using CoT-enriched medical records.This approach was designed to embed the clinical logic(symptoms→pathogenesis→therapeutic principles→prescriptions)into the model’s reasoning capabilities.The resulting fine-tuned model,specialized for TCM diarrhea,was designated as Qwen-TCM-Dia.Model performance was evaluated for disease diagnosis and syndrome type differentiation using accuracy,precision,recall,and F1-score.Furthermore,the quality of the generated prescriptions was compared with that of established open-source TCM LLMs.Results Qwen-TCM-Dia achieved peak performance compared to both the base Qwen2.5 model and five other open-source TCM LLMs.It achieved 97.05%accuracy and 91.48%F1-score in disease diagnosis,and 74.54%accuracy and 74.21%F1-score in syndrome type differentiation.Compared with existing open-source TCM LLMs(BianCang,HuangDi,LingDan,TCMLLM-PR,and ZhongJing),Qwen-TCM-Dia exhibited higher fidelity in reconstructing the“symptoms→pathogenesis→therapeutic principles→prescriptions”logic chain.It provided complete prescriptions,whereas other models often omitted dosages or generated mismatched prescriptions.Conclusion By integrating continued pre-training,CoT reasoning,and a two-stage fine-tuning strategy,this study establishes a CDPGS for diarrhea in TCM.The results demonstrate the synergistic effect of strengthening domain representation through pre-training and activating logical reasoning via CoT.This research not only provides critical technical support for the standardized diagnosis and treatment of diarrhea but also offers a scalable paradigm for the digital inheritance of expert TCM experience and the intelligent transformation of TCM.展开更多
文摘In this paper,we establish some strong laws of large numbers,which are for nonindependent random variables under the framework of sublinear expectations.One of our main results is for blockwise m-dependent random variables,and another is for sub-orthogonal random variables.Both extend the strong law of large numbers for independent random variables under sublinear expectations to the non-independent case.
文摘Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.
文摘To the Editor,Artificial intelligence(AI)usage has been increasing.Many fields have implemented the use of AI and Large LanguageModels(LLMs),especially in medicine.Furthermore,manypatients have increasingly been using AI;often,they will prompt AI with questions before even stepping into a physi-cian's office.The question lies in whether the information produced by AI is reliable and if this information is concise and easy to read across all patient populations.
基金funded by the National Key Research and Development Program of China(No.2021YFA1100500)the National Natural Science Foundation of China(No.82370662)the Key Research&Development Plan of Zhejiang Province(No.2024C03051).
文摘This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing patient queries about disease and lifestyle behaviors.The models selected were ChatGPT-4o,Gemini 2.0 Pro,Claude 3.5 Sonnet,and DeepSeek V3,with 12 questions chosen by two HCV experts from the domains of prevention,diagnosis,and treatment.
基金supported by University Grant Agency of Matej Bel University in Banská Bystrica project number UGA-14-PDS-2025.
文摘It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problematic,since there is a need to differentiate between these two scenarios.Until recently,the proper−semantic−causality of the relationship could have been determined only by human experts from the area of expertise of the studied data.This has changed with the advance of large language models,which are often utilized as surrogates for such human experts,making the process automated and readily available to all data analysts.This motivates the main objective of this work,which is to introduce the design and implementation of a large language model-based semantic causality evaluator based on correlation analysis,together with its visual analysis model called Causal heatmap.After the implementation itself,the model is evaluated from the point of view of the quality of the visual model,from the point of view of the quality of causal evaluation based on large language models,and from the point of view of comparative analysis,while the results reached in the study highlight the usability of large language models in the task and the potential of the proposed approach in the analysis of unknown datasets.The results of the experimental evaluation demonstrate the usefulness of the Causal heatmap method,supported by the evident highlighting of interesting relationships,while suppressing irrelevant ones.
基金the National Natural Science Foundation of China(grant No.52163022,62305076)Sichuan Science and Technology Program(2024ZYD0196)+1 种基金China Postdoctoral Science Foundation(2023M740505)Sichuan Postdoctoral Science Special Foundation(No.TB2023010)。
文摘Electrochromic smart windows(ESWs)can significantly reduce building energy consumption,but the high cost hinders large-scale production.The in situ growth of tungsten oxide(WO_(3))films is only by a simple immersion process,the silver nanowires(AgNWs)undergo oxidation to Ag^(+)ions through electron loss,and the liberated electrons provide driving force for the deposition of WO_(4)^(2-).Enabled the fabrication of large-area WO_(3)films and ESWs were fabricated under minimal laboratory conditions,demonstrating the economic feasibility,efficient and reliable nature of industrial production.Structural characterization and density functional theory calculations were combined to confirm that AgNWs effectively regulate oxygen vacancies of WO_(3)films and promote the in situ growth process.The optimized WO_(3)exhibits a maximum transmittance modulation of 90.8%and excellent cycling stability of 20,000 cycles.The largescale WO_(3)-based ESWs can save building energy up to 140.0 MJ m^(-2)compared to traditional windows in tropical regions,as verified by simulations more than40 global cities.This research provides a new approach for improving the performance and industrial production of ESW,providing the full understanding and development direction to short the distance of the ESW commercial production.
文摘Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Jabbour et al.2017,Wong et al.2018,and Wu et al.2021)were replicated using ChatGPTv3.5 and Bard.Outputs(author,title,publication year,and journal)were compared to the original references and cross-referenced with medical databases for authenticity and recall.Results:Several themes emerged when comparing Bard and ChatGPT across the three reviews.Bard generated more outputs and had greater recall in Wong et al.'s review,with a broader date range in Jabbour et al.'s review.In Wu et al.'s review,ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.Conclusion:Large language models(LLMs)failed to fully replicate peer-reviewed methodologies,producing outputs with inaccuracies but identifying relevant,especially recent,articles missed by the references.While human-led PRISMA-based reviews remain the gold standard,refining LLMs for literature reviews shows potential.
基金funded by the Office of the Vice-President for Research and Development of Cebu Technological University.
文摘This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.
基金support of the National Natural Science Foundation of China(Nos.52322603 and U24A20141)the Science Center for Gas Turbine Project of China(No.P2023-B-Ⅱ-001-001)the Fundamental Research Funds for the Central Universities of China and the Beijing Nova Program of China(Nos.20220484074 and 20230484479)。
文摘This study investigates the turbulence-induced disturbances and stall precursor triggering mechanism in NACA65-18(10)cascade based on large eddy simulations.The results indicate that the disturbances exist under various operating conditions along the performance curve.The shear layer is the physical structure responsible for the generation,propagation,and dissipation of disturbances.When operating near stall,the separation on the suction surface intensifies,and strong unsteady backflow occurs at the trailing edge of the passage.Under the influence of inlet disturbances,unsteady behaviors between passages form specific phase differences,leading the entire system to oscillate in a first-order mode.As the flow develops from near-stall to stall,axial momentum decreases further,reducing the main flow’s ability to drive blockages downstream through convection.Consequently,the blockage accumulates during the circumferential propagation process until the stall onset.Based on the above mechanism,this study proposes factors describing the size of the backflow zone,shedding frequency,and convection velocity to characterize blockage dynamics,identifying critical values that represent the stall onset.
基金supported by the National Science and Technology Council(NSTC),Taiwan,under grant number 114-2221-E-182-041-MY3by Chang Gung University and Chang Gung Memorial Hospital under project number NERPD4Q0021.
文摘The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streamlined in different domains.The offered study introduces an immediate optimization outline,named PROMPTx-PE,that is going to yield a greater level of precision and strength when it comes to the assignments that are premised on LLM.The proposed systemfeatures a timely selection schemewhich is informed by reinforcement learning,a contextual layer and a dynamic weighting module which is regulated by Lyapunov-based stability guidelines.The PROMPTx-PE dynamically varies the exploration and exploitation of the prompt space,depending on real-time feedback and multi-objective reward development.Extensive testing on both benchmark(GLUE,SuperGLUE)and domain-specific data(Healthcare-QA and Industrial-NER)demonstrates a large best performance to be 89.4%and a strong robustness disconnect with under 3%computation expense.The results confirm the effectiveness,consistency,and scalability of PROMPTx-PE as a platform of adaptive prompt engineering based on recent uses of LLMs.
文摘Impacted upper ureteral stones are definedas calculi that remain lodged in the same location within the upper ureter for more than two months,1 and they are typically associated with inflammation,mucosal edema,and fibrosisof the surrounding ureteral wall.These stones often lead to significantclinical consequences,including persistent flankpain,hydronephrosis,infection,impaired renal function,and in severe cases,irreversible kidney damage.
文摘The integration of large-scale foundation models(e.g.,GPT series and AlphaFold)into oncology is fundamentally transforming both research methodologies and clinical practices,driven by unprecedented advancements in computational power.This review synthesizes recent progress in the application of large language models to core oncological tasks,including medical imaging analysis,genomic interpretation,and personalized treatment planning.Underpinned by advanced computational infrastructures,such as graphics processing unit/tensor processing unit clusters,heterogeneous computing,and cloud platforms,these models enable superior representation learning and generalization across multimodal data sources.This review examines how these infrastructures overcome key bottlenecks in intelligent oncology through scalable optimization strategies,including mixed-precision training,memory optimization,and heterogeneous computing.Alongside these technical advancements,the review explores pressing challenges,such as data heterogeneity,limited model interpretability,regulatory uncertainties,and the environmental impact of artificial intelligence(AI)systems.Special emphasis is placed on emerging solutions,encompassing green AI and edge computing,which offer promising approaches for low-resource deployment scenarios.Additionally,the review highlights the critical role of interdisciplinary collaboration among oncology,computer science,ethics,and policy to ensure that AI systems are not only powerful but also transparent,safe,and clinically relevant.Finally,the review outlines potential avenues for future research aimed at developing robust,scalable,and human-centered frameworks for intelligent oncology.
文摘Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a framework for evaluating LLMs and physician decisions in challenging lung cancer cases.Methods:We curated 50 challenging lung cancer cases(25 local and 25 published)classified as complex,rare,or refractory.Blinded three-dimensional,five-point Likert evaluations(1–5 for comprehensiveness,specificity,and readability)compared standalone LLMs(DeepSeek R1,Claude 3.5,Gemini 1.5,and GPT-4o),physicians by experience level(junior,intermediate,and senior),and AI-assisted juniors;intergroup differences and augmentation effects were analyzed statistically.Results:Of 50 challenging cases(18 complex,17 rare,and 15 refractory)rated by three experts,DeepSeek R1 achieved scores of 3.95±0.33,3.71±0.53,and 4.26±0.18 for comprehensiveness,specificity,and readability,respectively,positioning it between intermediate(3.68,3.68,3.75)and senior(4.50,4.64,4.53)physicians.GPT-4o and Claude 3.5 reached intermediate physician–level comprehensiveness(3.76±0.39,3.60±0.39)but junior-to-intermediate physician–level specificity(3.39±0.39,3.39±0.49).All LLMs scored higher on rare cases than intermediate physicians but fell below junior physicians in refractory-case specificity.AIassisted junior physicians showed marked gains in rare cases,with comprehensiveness rising from 2.32 to 4.29(84.8%),specificity from 2.24 to 4.26(90.8%),and readability from 2.76 to 4.59(66.0%),while specificity declined by 3.2%(3.17 to 3.07)in refractory cases.Error analysis showed complementary strengths,with physicians demonstrating reasoning stability and LLMs excelling in knowledge updating and risk management.Conclusions:LLMs performed variably in clinical decision-making tasks depending on case type,performing better in rare cases and worse in refractory cases requiring longitudinal reasoning.Complementary strengths between LLMs and physicians support case-and task-tailored human–AI collaboration.
基金supported by the Ningbo Natural Science Foundation(Youth Foundation,No.2024J449)the Scientific Research Foundation for Introduced Talents of Ningbo University(Nos.ZX2022000602 and ZX2024000043)。
文摘Polyfluoroalkyl substances(PFAS)have emerged as persistent environmental contaminants because of their chemical stability,degradation-resistance and bioaccumulation potential.However,current studies mainly focus on the toxicity of single PFAS such as perfluorooctanoic acid(PFOA)and perfluorobutanoic acid(PFBA),the knowledge of their combined effects is relatively limited.In this study,we explored the immune response of the gut in large yellow croaker(Larimichthys crocea)under the combined stress of PFOA and PFBA.Histologicalanalyses revealed that the combined effect induced intestinal vacuolization and decreased the length of intestinal villi.And it significantly activated pro-inflammatory pathways with marked upregulation of tnfα,il1β,il6 and myd88 expressions,particularly after 14 days of exposure.Gut microbiota analysis revealed substantial dysbiosis,including 1)reduced alpha diversity,2)increased abundance of potential pathogenic taxa(Proteobacteria and Spirochaetota),and 3)depletion of beneficial Firmicutes.PICRUSt-based functional prediction indicated temporal metabolic shifts,with upregulation of DNA repair pathways at day 3 and enhanced bacterial motility protein activity at days 7 and 14 of post-exposure.The Pearson correlation analysis further indicated that these immune genes had significant positive correlations with Vibrio and Brevinema,and negative correlations with Streptococcus.Our present study will provide novel insights into the microbiome-mediated immunomodulation in the larger yellow croaker exposed to combined PFAS,which will be helpful for healthy farming of economically important marine species.
基金supported in part by the Mining Hydraulic Technology and Equipment Engineering Research Center,Liaoning Technical University,Fuxin,China(Grant No.MHTE23-R04)the Fundamental Research Funds for the Central Universities(ID N25BSS068).
文摘This study presents an implicit multiphysics coupling method integrating Computational Fluid Dynamics(CFD),the Multiphase Particle-in-Cell(MPPIC)model,and the Finite Element Method(FEM),implemented with OpenFOAM,CalculiX,and preCICE to simulate fluid-particle-structure interactions with large deformations.Mesh motion in the fluid field is handled using the radial basis function(RBF)method.The particle phase is modeled by MPPIC,where fluid-particle interaction is described through momentum exchange,and inter-particle collisions are characterized by collision stress.The structural field is solved by nonlinear FEM to capture large deformations induced by geometric nonlinearity.Coupling among fields is realized through a partitioned,parallel,and non-intrusive iterative strategy,ensuring stable transfer and convergence of interface forces and displacements.Notably,the influence of particles on the structure is not direct but mediated by the fluid,while structural motion directly affects particle dynamics.The results demonstrate that the proposed approach effectively captures multiphysics interaction processes and provides a valuable reference for numerical modeling of coupled fluid-particle-structure systems.
基金the research project LaTe4PoliticES(PID2022-138099OB-I00)funded by MCIN/AEI/10.13039/501100011033 and the European Fund for Regional Development(ERDF)-a way to make Europe.Tomás Bernal-Beltrán is supported by University of Murcia through the predoctoral programme.
文摘The malicious dissemination of hate speech via compromised accounts,automated bot networks and malware-driven social media campaigns has become a growing cybersecurity concern.Automatically detecting such content in Spanish is challenging due to linguistic complexity and the scarcity of annotated resources.In this paper,we compare two predominant AI-based approaches for the forensic detection of malicious hate speech:(1)finetuning encoder-only models that have been trained in Spanish and(2)In-Context Learning techniques(Zero-and Few-Shot Learning)with large-scale language models.Our approach goes beyond binary classification,proposing a comprehensive,multidimensional evaluation that labels each text by:(1)type of speech,(2)recipient,(3)level of intensity(ordinal)and(4)targeted group(multi-label).Performance is evaluated using an annotated Spanish corpus,standard metrics such as precision,recall and F1-score and stability-oriented metrics to evaluate the stability of the transition from zero-shot to few-shot prompting(Zero-to-Few Shot Retention and Zero-to-Few Shot Gain)are applied.The results indicate that fine-tuned encoder-only models(notably MarIA and BETO variants)consistently deliver the strongest and most reliable performance:in our experiments their macro F1-scores lie roughly in the range of approximately 46%–66%depending on the task.Zero-shot approaches are much less stable and typically yield substantially lower performance(observed F1-scores range approximately 0%–39%),often producing invalid outputs in practice.Few-shot prompting(e.g.,Qwen 38B,Mistral 7B)generally improves stability and recall relative to pure zero-shot,bringing F1-scores into a moderate range of approximately 20%–51%but still falling short of fully fine-tuned models.These findings highlight the importance of supervised adaptation and discuss the potential of both paradigms as components in AI-powered cybersecurity and malware forensics systems designed to identify and mitigate coordinated online hate campaigns.
文摘LargeLanguageModels(LLMs)are increasingly appliedinthe fieldof code translation.However,existing evaluation methodologies suffer from two major limitations:(1)the high overlap between test data and pretraining corpora,which introduces significant bias in performance evaluation;and(2)mainstream metrics focus primarily on surface-level accuracy,failing to uncover the underlying factors that constrain model capabilities.To address these issues,this paper presents TCode(Translation-Oriented Code Evaluation benchmark)—a complexity-controllable,contamination-free benchmark dataset for code translation—alongside a dedicated static feature sensitivity evaluation framework.The dataset is carefully designed to control complexity along multiple dimensions—including syntactic nesting and expression intricacy—enabling both broad coverage and fine-grained differentiation of sample difficulty.This design supports precise evaluation of model capabilities across a wide spectrum of translation challenges.The proposed evaluation framework introduces a correlation-driven analysis mechanism based on static program features,enabling predictive modeling of translation success from two perspectives:Code Form Complexity(e.g.,code length and character density)and Semantic Modeling Complexity(e.g.,syntactic depth,control-flow nesting,and type system complexity).Empirical evaluations across representative LLMs—including Qwen2.5-72B and Llama3.3-70B—demonstrate that even state-of-the-art models achieve over 80% compilation success on simple samples,but their accuracy drops sharply below 40% on complex cases.Further correlation analysis indicates that Semantic Modeling Complexity alone is correlated with up to 60% of the variance in translation success,with static program features exhibiting nonlinear threshold effects that highlight clear capability boundaries.This study departs fromthe traditional accuracy-centric evaluation paradigm and,for the first time,systematically characterizes the capabilities of large languagemodels in translation tasks through the lens of programstatic features.The findings provide actionable insights for model refinement and training strategy development.
基金supported by 2023 Higher Education Scientific Research Planning Project of China Society of Higher Education(No.23PG0408)2023 Philosophy and Social Science Research Programs in Jiangsu Province(No.2023SJSZ0993)+2 种基金Nantong Science and Technology Project(No.JC2023070)Key Project of Jiangsu Province Education Science 14th Five-Year Plan(Grant No.B-b/2024/02/41)the Open Fund of Advanced Cryptography and System Security Key Laboratory of Sichuan Province(Grant No.SKLACSS-202407).
文摘Large language models(LLMs)have revolutionized AI applications across diverse domains.However,their widespread deployment has introduced critical security vulnerabilities,particularly prompt injection attacks that manipulate model behavior through malicious instructions.Following Kitchenham’s guidelines,this systematic review synthesizes 128 peer-reviewed studies from 2022 to 2025 to provide a unified understanding of this rapidly evolving threat landscape.Our findings reveal a swift progression from simple direct injections to sophisticated multimodal attacks,achieving over 90%success rates against unprotected systems.In response,defense mechanisms show varying effectiveness:input preprocessing achieves 60%–80%detection rates and advanced architectural defenses demonstrate up to 95%protection against known patterns,though significant gaps persist against novel attack vectors.We identified 37 distinct defense approaches across three categories,but standardized evaluation frameworks remain limited.Our analysis attributes these vulnerabilities to fundamental LLM architectural limitations,such as the inability to distinguish instructions from data and attention mechanism vulnerabilities.This highlights critical research directions such as formal verification methods,standardized evaluation protocols,and architectural innovations for inherently secure LLM designs.
基金supported by the Meteorological Joint Funds of the National Natural Science Foundation of China(Grant No.U2142211)the National Natural Science Foundation of China(Grant Nos.42075141,42341202 and 62088101)+1 种基金the National Key Research and Development Program of China(Grant No.2020YFA0608000)the Shanghai Municipal Science and Technology Major Project(Grant No.2021SHZDZX0100).
文摘Accurate forecasting of tropical cyclone(TC)tracks and intensities is essential.Although the TianXing large weather model,a six-hourly forecasting model surpassing operational forecasts,exhibits superior performance,its TC forecasts still require enhancement.Prediction errors persist due to biases in the training data and smoothing effects in data-driven methods.To address this,we introduce CycloneBCNet,a deep-learning model designed to correct TianXing’s TC forecast biases by leveraging spatial and temporal data.CycloneBCNet utilizes the SimVP(simpler yet better video prediction)framework with spatial attention to highlight cyclone core regions in forecast fields.It also incorporates TC trend information(center position,maximum wind speed,and minimum sea level pressure)via an LSTM(long short-term memory)module.These TC vectors are derived from post-processed TianXing forecasts.By fusing features from forecast fields and TC vectors,CycloneBCNet corrects biases across multiple lead times.At a 96-h lead time,the track error reduces from 162.4 to 86.4 km,the wind speed error from 17.2 to 6.69 m s^(-1),and the pressure error from 22.2 to 9.36 hPa.Interpretability analysis shows that CycloneBCNet adjusts its attention across forecast lead times.Intensity corrections prioritize inner-core dynamics,particularly the eye and eyewall,while track corrections shift from lower-level variables and the cyclone’s core to broader environmental factors and mid-to upper-level features as the forecast duration increases.These findings demonstrate that CycloneBCNet effectively captures key TC dynamics consistent with meteorological principles,including the dominance of near-surface conditions for intensity and the increasing influence of steering currents on track prediction.
基金National Key Research and Development Program of China(2024YFC3505400)Capital Clinical Project of Beijing Municipal Science&Technology Commission(Z221100007422092)Capital’s Funds for Health Improvement and Research(2024-1-2231).
文摘Objective To develop a clinical decision and prescription generation system(CDPGS)specifically for diarrhea in traditional Chinese medicine(TCM),utilizing a specialized large language model(LLM),Qwen-TCM-Dia,to standardize diagnostic processes and prescription generation.Methods Two primary datasets were constructed:an evaluation benchmark and a fine-tuning dataset consisting of fundamental diarrhea knowledge,medical records,and chain-ofthought(CoT)reasoning datasets.After an initial evaluation of 16 open-source LLMs across inference time,accuracy,and output quality,Qwen2.5 was selected as the base model due to its superior overall performance.We then employed a two-stage low-rank adaptation(LoRA)fine-tuning strategy,integrating continued pre-training on domain-specific knowledge with instruction fine-tuning using CoT-enriched medical records.This approach was designed to embed the clinical logic(symptoms→pathogenesis→therapeutic principles→prescriptions)into the model’s reasoning capabilities.The resulting fine-tuned model,specialized for TCM diarrhea,was designated as Qwen-TCM-Dia.Model performance was evaluated for disease diagnosis and syndrome type differentiation using accuracy,precision,recall,and F1-score.Furthermore,the quality of the generated prescriptions was compared with that of established open-source TCM LLMs.Results Qwen-TCM-Dia achieved peak performance compared to both the base Qwen2.5 model and five other open-source TCM LLMs.It achieved 97.05%accuracy and 91.48%F1-score in disease diagnosis,and 74.54%accuracy and 74.21%F1-score in syndrome type differentiation.Compared with existing open-source TCM LLMs(BianCang,HuangDi,LingDan,TCMLLM-PR,and ZhongJing),Qwen-TCM-Dia exhibited higher fidelity in reconstructing the“symptoms→pathogenesis→therapeutic principles→prescriptions”logic chain.It provided complete prescriptions,whereas other models often omitted dosages or generated mismatched prescriptions.Conclusion By integrating continued pre-training,CoT reasoning,and a two-stage fine-tuning strategy,this study establishes a CDPGS for diarrhea in TCM.The results demonstrate the synergistic effect of strengthening domain representation through pre-training and activating logical reasoning via CoT.This research not only provides critical technical support for the standardized diagnosis and treatment of diarrhea but also offers a scalable paradigm for the digital inheritance of expert TCM experience and the intelligent transformation of TCM.