With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more gener...With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more general constructs.Based on the profile mechanism of unified modeling language(UML) 2.2,a kind of DSML is presented to model simulation testing systems of avionic software(STSAS).To define the syntax,semantics and notions of the DSML,the domain model of the STSAS from which we generalize the domain concepts and relationships among these concepts is given,and then,the domain model is mapped into a UML meta-model,named UML-STSAS profile.Assuming a flight control system(FCS) as system under test(SUT),we design the relevant STSAS.The results indicate that extending UML to the simulation testing domain can effectively and precisely model STSAS.展开更多
This paper presents a model-based framework for the dynamic verification of spacecraft systems,which tightly integrates an executable systems modeling language architectural model with the in-house orbital analysis to...This paper presents a model-based framework for the dynamic verification of spacecraft systems,which tightly integrates an executable systems modeling language architectural model with the in-house orbital analysis tool SpaceSim to achieve a closed-loop workflow encompassing system design,analysis,and verification.The method is abstracted into 2 generic types of meta-models:the co-simulation meta-model,which captures the structure of co-simulation commands and data formats,and the system-of-interest meta-model,which ensures hierarchical and modular system architectures,thereby supporting flexible iterative design and verification.The proposed framework is demonstrated through a space mission case study,in which dynamic simulation is used to compute key performance indicators such as energy and information flow balance and to validate associated requirements in real time.The adaptability of the approach is further evaluated through multiple simulated mission change scenarios across 3 dimensions:simulation context,system behavior,and parameter modification.Results indicate that the proposed method effectively reduces the complexity and effort required for model updates and enhances the overall flexibility of system analysis.This study offers a generalizable paradigm for integrating model-based systems engineering with domain-specific simulation tools,laying the groundwork for subsequent highfidelity model replacement,trade-off analysis,and optimization-based design.展开更多
In materials science and engineering design,high-fidelity and high-efficiency numerical simulation has become a driving force for innovation and practical implementation.To address longstanding bottlenecks in the deve...In materials science and engineering design,high-fidelity and high-efficiency numerical simulation has become a driving force for innovation and practical implementation.To address longstanding bottlenecks in the development of conventional material constitutive models—such as lengthy modeling cycles and difficulties in numerical implementation—this study proposes an intelligent modeling and code generation approach powered by large languagemodels.A structured knowledge base integrating constitutive theory,numerical algorithms,and UMAT(User Material)interface specifications is constructed,and a retrieval-augmented generation strategy is employed to establish an end-to-end workflow spanning experimental data parsing,constitutive model formulation,and automatic UMAT subroutine generation.Experimental results show that the method achieves high accuracy for both a classical Johnson–Cookmodel and a physics-informed neural network(PINN)model,with key parameter identification errors below 5%.Moreover,the automatically generated UMAT subroutines yield finite element simulation results in Abaqus that are highly consistent with theoretical predictions(coefficient of determination R2>0.98)while maintaining good numerical stability.This framework is currently focused on the automatic construction of rate-dependent elastoplastic material models,and its core method also provides a clear path for extending to other constitutive categories such as hyperelasticity and viscoelasticity.This work provides an effective technical route for the rapid development and reliable numerical implementation of material constitutive models,significantly advancing the intelligence level of computational mechanics research and improving engineering application efficiency.展开更多
Data-driven approaches are extensively employed to model complex chemical engineering processes, such as hydrotreating, to address the challenges of mechanism-based methods demanding deep process understanding. Howeve...Data-driven approaches are extensively employed to model complex chemical engineering processes, such as hydrotreating, to address the challenges of mechanism-based methods demanding deep process understanding. However, the development of such models requires specialized expertise in data science, limiting their broader application. Large language models (LLMs), such as GPT-4, have demonstrated potential in supporting and guiding research efforts. This work presents a novel AI-assisted framework where GPT-4, through well-engineered prompts, facilitates the construction and explanation of multi-objective neural networks. These models predict hydrotreating products properties (such as distillation range), including refined diesel and refined gas oil, using feedstock properties, operating conditions, and recycle hydrogen composition. Gradient-weighted class activation mapping was employed to identify key features influencing the output variables. This work illustrates an innovative AI-guided paradigm for chemical engineering applications, and the designed prompts hold promise for adaptation to other complex processes.展开更多
Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartph...Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.展开更多
Dear Editor,This letter deals with automatically constructing an OPC UA information model(IM)aimed at enhancing data interoperability among heterogeneous system components within manufacturing automation systems.Empow...Dear Editor,This letter deals with automatically constructing an OPC UA information model(IM)aimed at enhancing data interoperability among heterogeneous system components within manufacturing automation systems.Empowered by the large language model(LLM),we propose a novel multi-agent collaborative framework to streamline the end-to-end OPC UA IM modeling process.Each agent is equipped with meticulously engineered prompt templates,augmenting their capacity to execute specific tasks.We conduct modeling experiments using real textual data to demonstrate the effectiveness of the proposed method,improving modeling efficiency and reducing the labor workload.展开更多
Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLM...Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.展开更多
This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing...This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing patient queries about disease and lifestyle behaviors.The models selected were ChatGPT-4o,Gemini 2.0 Pro,Claude 3.5 Sonnet,and DeepSeek V3,with 12 questions chosen by two HCV experts from the domains of prevention,diagnosis,and treatment.展开更多
Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interact...Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interactions to predict future items of interest.However,many current methods rely on unique user and item IDs,limiting their ability to represent users and items effectively,especially in zero-shot learning scenarios where training data is scarce.With the rapid development of Large Language Models(LLMs),researchers are exploring their potential to enhance recommendation systems.However,there is a semantic gap between the linguistic semantics of LLMs and the collaborative semantics of recommendation systems,where items are typically indexed by IDs.Moreover,most research focuses on item representations,neglecting personalized user modeling.To address these issues,we propose a sequential recommendation framework using LLMs,called CIT-Rec,a model that integrates Collaborative semantics for user representation and Image and Text information for item representation to enhance Recommendations.Specifically,by aligning intuitive image information with text containing semantic features,we can more accurately represent items,improving item representation quality.We focus not only on item representations but also on user representations.To more precisely capture users’personalized preferences,we use traditional sequential recommendation models to train on users’historical interaction data,effectively capturing behavioral patterns.Finally,by combining LLMs and traditional sequential recommendation models,we allow the LLM to understand linguistic semantics while capturing collaborative semantics.Extensive evaluations on real-world datasets show that our model outperforms baseline methods,effectively combining user interaction history with item visual and textual modalities to provide personalized recommendations.展开更多
Classifying job offers into occupational categories is a fundamental task in human resource information systems,as it improves and streamlines indexing,search,and matching between openings and job seekers.Comprehensiv...Classifying job offers into occupational categories is a fundamental task in human resource information systems,as it improves and streamlines indexing,search,and matching between openings and job seekers.Comprehensive occupational databases such as O∗NET or ESCO provide detailed taxonomies of interrelated positions that can be leveraged to align the textual content of postings with occupational categories,thereby facilitating standardization,cross-system interoperability,and access to metadata for each occupation(e.g.,tasks,knowledge,skills,and abilities).In this work,we explore the effectiveness of fine-tuning existing language models(LMs)to classify job offers with occupational descriptors from O∗NET.This enables a more precise assessment of candidate suitability by identifying the specific knowledge and skills required for each position,and helps automate recruitment processes by mitigating human bias and subjectivity in candidate selection.We evaluate three representative BERT-like models:BERT,RoBERTa,and DeBERTa.BERT serves as the baseline encoder-only architecture;RoBERTa incorporates advances in pretraining objectives and data scale;and DeBERTa introduces architectural improvements through disentangled attention mechanisms.The best performance was achieved with the DeBERTa model,although the other models also produced strong results,and no statistically significant differences were observed acrossmodels.We also find that these models typically reach optimal performance after only a few training epochs,and that training with smaller,balanced datasets is effective.Consequently,comparable results can be obtained with models that require fewer computational resources and less training time,facilitating deployment and practical use.展开更多
It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problemat...It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problematic,since there is a need to differentiate between these two scenarios.Until recently,the proper−semantic−causality of the relationship could have been determined only by human experts from the area of expertise of the studied data.This has changed with the advance of large language models,which are often utilized as surrogates for such human experts,making the process automated and readily available to all data analysts.This motivates the main objective of this work,which is to introduce the design and implementation of a large language model-based semantic causality evaluator based on correlation analysis,together with its visual analysis model called Causal heatmap.After the implementation itself,the model is evaluated from the point of view of the quality of the visual model,from the point of view of the quality of causal evaluation based on large language models,and from the point of view of comparative analysis,while the results reached in the study highlight the usability of large language models in the task and the potential of the proposed approach in the analysis of unknown datasets.The results of the experimental evaluation demonstrate the usefulness of the Causal heatmap method,supported by the evident highlighting of interesting relationships,while suppressing irrelevant ones.展开更多
Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Ja...Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Jabbour et al.2017,Wong et al.2018,and Wu et al.2021)were replicated using ChatGPTv3.5 and Bard.Outputs(author,title,publication year,and journal)were compared to the original references and cross-referenced with medical databases for authenticity and recall.Results:Several themes emerged when comparing Bard and ChatGPT across the three reviews.Bard generated more outputs and had greater recall in Wong et al.'s review,with a broader date range in Jabbour et al.'s review.In Wu et al.'s review,ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.Conclusion:Large language models(LLMs)failed to fully replicate peer-reviewed methodologies,producing outputs with inaccuracies but identifying relevant,especially recent,articles missed by the references.While human-led PRISMA-based reviews remain the gold standard,refining LLMs for literature reviews shows potential.展开更多
This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to use...This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.展开更多
Knowledge distillation has become a standard technique for compressing large language models into efficient student models,but existing methods often struggle to balance prediction accuracy with explanation quality.Re...Knowledge distillation has become a standard technique for compressing large language models into efficient student models,but existing methods often struggle to balance prediction accuracy with explanation quality.Recent approaches such as Distilling Step-by-Step(DSbS)introduce explanation supervision,yet they apply it in a uniform manner that may not fully exploit the different learning dynamics of prediction and explanation.In this work,we propose a task-structured curriculum learning(TSCL)framework that structures training into three sequential phases:(i)prediction-only,to establish stable feature representations;(ii)joint prediction-explanation,to align task outputs with rationale generation;and(iii)explanation-only,to refine the quality of rationales.This design provides a simple but effective modification to DSbS,requiring no architectural changes and adding negligible training cost.We justify the phase scheduling with ablation studies and convergence analysis,showing that an initial prediction-heavy stage followed by a balanced joint phase improves both stability and explanation alignment.Extensive experiments on five datasets(e-SNLI,ANLI,CommonsenseQA,SVAMP,and MedNLI)demonstrate that TSCL consistently outperforms strong baselines,achieving gains of+1.7-2.6 points in accuracy and 0.8-1.2 in ROUGE-L,corresponding to relative error reductions of up to 21%.Beyond lexical metrics,human evaluation and ERASERstyle faithfulness diagnostics confirm that TSCL produces more faithful and informative explanations.Comparative training curves further reveal faster convergence and lower variance across seeds.Efficiency analysis shows less than 3%overhead in wall-clock training time and no additional inference cost,making the approach practical for realworld deployment.This study demonstrates that a simple task-structured curriculum can significantly improve the effectiveness of knowledge distillation.By separating and sequencing objectives,TSCL achieves a better balance between accuracy,stability,and explanation quality.The framework generalizes across domains,including medical NLI,and offers a principled recipe for future applications in multimodal reasoning and reinforcement learning.展开更多
Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a fr...Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a framework for evaluating LLMs and physician decisions in challenging lung cancer cases.Methods:We curated 50 challenging lung cancer cases(25 local and 25 published)classified as complex,rare,or refractory.Blinded three-dimensional,five-point Likert evaluations(1–5 for comprehensiveness,specificity,and readability)compared standalone LLMs(DeepSeek R1,Claude 3.5,Gemini 1.5,and GPT-4o),physicians by experience level(junior,intermediate,and senior),and AI-assisted juniors;intergroup differences and augmentation effects were analyzed statistically.Results:Of 50 challenging cases(18 complex,17 rare,and 15 refractory)rated by three experts,DeepSeek R1 achieved scores of 3.95±0.33,3.71±0.53,and 4.26±0.18 for comprehensiveness,specificity,and readability,respectively,positioning it between intermediate(3.68,3.68,3.75)and senior(4.50,4.64,4.53)physicians.GPT-4o and Claude 3.5 reached intermediate physician–level comprehensiveness(3.76±0.39,3.60±0.39)but junior-to-intermediate physician–level specificity(3.39±0.39,3.39±0.49).All LLMs scored higher on rare cases than intermediate physicians but fell below junior physicians in refractory-case specificity.AIassisted junior physicians showed marked gains in rare cases,with comprehensiveness rising from 2.32 to 4.29(84.8%),specificity from 2.24 to 4.26(90.8%),and readability from 2.76 to 4.59(66.0%),while specificity declined by 3.2%(3.17 to 3.07)in refractory cases.Error analysis showed complementary strengths,with physicians demonstrating reasoning stability and LLMs excelling in knowledge updating and risk management.Conclusions:LLMs performed variably in clinical decision-making tasks depending on case type,performing better in rare cases and worse in refractory cases requiring longitudinal reasoning.Complementary strengths between LLMs and physicians support case-and task-tailored human–AI collaboration.展开更多
The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streaml...The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streamlined in different domains.The offered study introduces an immediate optimization outline,named PROMPTx-PE,that is going to yield a greater level of precision and strength when it comes to the assignments that are premised on LLM.The proposed systemfeatures a timely selection schemewhich is informed by reinforcement learning,a contextual layer and a dynamic weighting module which is regulated by Lyapunov-based stability guidelines.The PROMPTx-PE dynamically varies the exploration and exploitation of the prompt space,depending on real-time feedback and multi-objective reward development.Extensive testing on both benchmark(GLUE,SuperGLUE)and domain-specific data(Healthcare-QA and Industrial-NER)demonstrates a large best performance to be 89.4%and a strong robustness disconnect with under 3%computation expense.The results confirm the effectiveness,consistency,and scalability of PROMPTx-PE as a platform of adaptive prompt engineering based on recent uses of LLMs.展开更多
War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient an...War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient and inflexible,with particularly pronounced limitations in command and decision-making.The overwhelming volume of information and high decision complexity hinder the realization of autonomous and agile command and control.To address this challenge,an intelligent warfare simulation framework named Command-Agent is proposed,which deeply integrates large language models(LLMs)with digital twin battlefields.By constructing a highly realistic battlefield environment through real-time simulation and multi-source data fusion,the natural language interaction capabilities of LLMs are leveraged to lower the command threshold and to enable autonomous command through the Observe-Orient-Decide-Act(OODA)feedback loop.Within the Command-Agent framework,a multimodel collaborative architecture is further adopted to decouple the decision-generation and command-execution functions of LLMs.By combining specialized models such as Deep Seek-R1 and MCTool,the limitations of single-model capabilities are overcome.MCTool is a lightweight execution model fine-tuned for military Function Calling tasks.The framework also introduces a Vector Knowledge Base to mitigate hallucinations commonly exhibited by LLMs.Experimental results demonstrate that Command-Agent not only enables natural language-driven simulation and control but also deeply understands commander intent.Leveraging the multi-model collaborative architecture,during red-blue UAV confrontations involving 2 to 8 UAVs,the integrated score is improved by an average of 41.8%compared to the single-agent system(MCTool),accompanied by a 161.8%optimization in the battle loss ratio.Furthermore,when compared with multi-agent systems lacking the knowledge base,the inclusion of the Vector Knowledge Base further improves overall performance by 16.8%.In comparison with the general model(Qwen2.5-7B),the fine-tuned MCTool leads by 5%in execution efficiency.Therefore,the proposed Command-Agent introduces a novel perspective to the military command system and offers a feasible solution for intelligent battlefield decision-making.展开更多
The personalized fine-tuning of large languagemodels(LLMs)on edge devices is severely constrained by limited computation resources.Although split federated learning alleviates on-device burdens,its effectiveness dimin...The personalized fine-tuning of large languagemodels(LLMs)on edge devices is severely constrained by limited computation resources.Although split federated learning alleviates on-device burdens,its effectiveness diminishes in few-shot reasoning scenarios due to the low data efficiency of conventional supervised fine-tuning,which leads to excessive communication overhead.To address this,we propose Language-Empowered Split Fine-Tuning(LESFT),a framework that integrates split architectures with a contrastive-inspired fine-tuning paradigm.LESFT simultaneously learns frommultiple logically equivalent but linguistically diverse reasoning chains,providing richer supervisory signals and improving data efficiency.This process-oriented training allows more effective reasoning adaptation with fewer samples.Extensive experiments demonstrate that LESFT consistently outperforms strong baselines such as SplitLoRA in task accuracy.LESFT consistently outperforms strong baselines on GSM8K,CommonsenseQA,and AQUA_RAT,with the largest gains observed on Qwen2.5-3B.These results indicate that LESFT can effectively adapt large language models for reasoning tasks under the computational and communication constraints of edge environments.展开更多
The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-super...The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.展开更多
API(Application Programming Interface)documentation often only describes individual APIs and lacks information on complex API relations and code examples.Retrieval-based and generation-based methods can both produce d...API(Application Programming Interface)documentation often only describes individual APIs and lacks information on complex API relations and code examples.Retrieval-based and generation-based methods can both produce documentation that includes API relationship descriptions and code examples.However,they are limited by the richness of available API resources.As a result,they struggle to be effective when dealing with resource-scarce languages such as Kotlin.We propose an on-demand API tutorial generation method for resource-scarce languages,transferring API knowledge from a resource-rich language like Java to Kotlin using an AI chain.Evaluating our method on 500 Kotlin APIs,we generated more API documents than the state-of-the-art retrieval-based method ADECK and the generate-based method gDoc.The number of API guidelines generated by our method is 37 times that of ADECK and 1.6 times that of gDoc.Compared with the scheme that did not adopt the knowledge transfer strategy,the success rate of our method has increased by 31.25 percentage points.This demonstrates the feasibility and potential of using LLMs to create new API knowledge across languages.展开更多
基金Aeronautical Science Foundation of China (20095551025)
文摘With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more general constructs.Based on the profile mechanism of unified modeling language(UML) 2.2,a kind of DSML is presented to model simulation testing systems of avionic software(STSAS).To define the syntax,semantics and notions of the DSML,the domain model of the STSAS from which we generalize the domain concepts and relationships among these concepts is given,and then,the domain model is mapped into a UML meta-model,named UML-STSAS profile.Assuming a flight control system(FCS) as system under test(SUT),we design the relevant STSAS.The results indicate that extending UML to the simulation testing domain can effectively and precisely model STSAS.
基金supported by the National Defense Key Laboratory of Space Intelligent Control Technology,China(grant number HTKJ2022KL502003).
文摘This paper presents a model-based framework for the dynamic verification of spacecraft systems,which tightly integrates an executable systems modeling language architectural model with the in-house orbital analysis tool SpaceSim to achieve a closed-loop workflow encompassing system design,analysis,and verification.The method is abstracted into 2 generic types of meta-models:the co-simulation meta-model,which captures the structure of co-simulation commands and data formats,and the system-of-interest meta-model,which ensures hierarchical and modular system architectures,thereby supporting flexible iterative design and verification.The proposed framework is demonstrated through a space mission case study,in which dynamic simulation is used to compute key performance indicators such as energy and information flow balance and to validate associated requirements in real time.The adaptability of the approach is further evaluated through multiple simulated mission change scenarios across 3 dimensions:simulation context,system behavior,and parameter modification.Results indicate that the proposed method effectively reduces the complexity and effort required for model updates and enhances the overall flexibility of system analysis.This study offers a generalizable paradigm for integrating model-based systems engineering with domain-specific simulation tools,laying the groundwork for subsequent highfidelity model replacement,trade-off analysis,and optimization-based design.
基金funded by the National Natural Science Foundation of China,grant number 52405341Foundation of National Key Laboratory of Computational Physics,grant number 6142A05QN24012+1 种基金Chongqing Science and Technology Committee,grant number CSTB2023NSCQ-MSX0363The Science and Technology Research Program of Chongqing Municipal Education Commission,grant number KJQN202301117.
文摘In materials science and engineering design,high-fidelity and high-efficiency numerical simulation has become a driving force for innovation and practical implementation.To address longstanding bottlenecks in the development of conventional material constitutive models—such as lengthy modeling cycles and difficulties in numerical implementation—this study proposes an intelligent modeling and code generation approach powered by large languagemodels.A structured knowledge base integrating constitutive theory,numerical algorithms,and UMAT(User Material)interface specifications is constructed,and a retrieval-augmented generation strategy is employed to establish an end-to-end workflow spanning experimental data parsing,constitutive model formulation,and automatic UMAT subroutine generation.Experimental results show that the method achieves high accuracy for both a classical Johnson–Cookmodel and a physics-informed neural network(PINN)model,with key parameter identification errors below 5%.Moreover,the automatically generated UMAT subroutines yield finite element simulation results in Abaqus that are highly consistent with theoretical predictions(coefficient of determination R2>0.98)while maintaining good numerical stability.This framework is currently focused on the automatic construction of rate-dependent elastoplastic material models,and its core method also provides a clear path for extending to other constitutive categories such as hyperelasticity and viscoelasticity.This work provides an effective technical route for the rapid development and reliable numerical implementation of material constitutive models,significantly advancing the intelligence level of computational mechanics research and improving engineering application efficiency.
基金supported by the National Key Research and Development Program of China(2023YFA1507601)the National Natural Science Foundation of China(22278127,22378038)+2 种基金the Fundamental Research Funds for the Central Universities(2022ZFJH004)the Shanghai Pilot Program for Basic Research(22T01400100-18)the Natural Science Foundation of Liaoning Province,China(2024-MSBA-15).
文摘Data-driven approaches are extensively employed to model complex chemical engineering processes, such as hydrotreating, to address the challenges of mechanism-based methods demanding deep process understanding. However, the development of such models requires specialized expertise in data science, limiting their broader application. Large language models (LLMs), such as GPT-4, have demonstrated potential in supporting and guiding research efforts. This work presents a novel AI-assisted framework where GPT-4, through well-engineered prompts, facilitates the construction and explanation of multi-objective neural networks. These models predict hydrotreating products properties (such as distillation range), including refined diesel and refined gas oil, using feedstock properties, operating conditions, and recycle hydrogen composition. Gradient-weighted class activation mapping was employed to identify key features influencing the output variables. This work illustrates an innovative AI-guided paradigm for chemical engineering applications, and the designed prompts hold promise for adaptation to other complex processes.
文摘Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.
基金supported supported by the Fundamental Research Funds for the Central Universities(226-2024-00004)the National Natural Science Foundation of China(U23 A20326)Key Research and Development Program of Zhejiang Province(2025C01061).
文摘Dear Editor,This letter deals with automatically constructing an OPC UA information model(IM)aimed at enhancing data interoperability among heterogeneous system components within manufacturing automation systems.Empowered by the large language model(LLM),we propose a novel multi-agent collaborative framework to streamline the end-to-end OPC UA IM modeling process.Each agent is equipped with meticulously engineered prompt templates,augmenting their capacity to execute specific tasks.We conduct modeling experiments using real textual data to demonstrate the effectiveness of the proposed method,improving modeling efficiency and reducing the labor workload.
文摘Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.
基金funded by the National Key Research and Development Program of China(No.2021YFA1100500)the National Natural Science Foundation of China(No.82370662)the Key Research&Development Plan of Zhejiang Province(No.2024C03051).
文摘This study evaluated the accuracy,completeness,and comprehensibility of responses from mainstream large language models(LLMs)to hepatitis C virus(HCV)-related questions,aiming to assess their performance in addressing patient queries about disease and lifestyle behaviors.The models selected were ChatGPT-4o,Gemini 2.0 Pro,Claude 3.5 Sonnet,and DeepSeek V3,with 12 questions chosen by two HCV experts from the domains of prevention,diagnosis,and treatment.
基金supported by the National Key R&D Program of China[2022YFF0902703]the State Administration for Market Regulation Science and Technology Plan Project(2024MK033).
文摘Recommendation systems are key to boosting user engagement,satisfaction,and retention,particularly on media platforms where personalized content is vital.Sequential recommendation systems learn from user-item interactions to predict future items of interest.However,many current methods rely on unique user and item IDs,limiting their ability to represent users and items effectively,especially in zero-shot learning scenarios where training data is scarce.With the rapid development of Large Language Models(LLMs),researchers are exploring their potential to enhance recommendation systems.However,there is a semantic gap between the linguistic semantics of LLMs and the collaborative semantics of recommendation systems,where items are typically indexed by IDs.Moreover,most research focuses on item representations,neglecting personalized user modeling.To address these issues,we propose a sequential recommendation framework using LLMs,called CIT-Rec,a model that integrates Collaborative semantics for user representation and Image and Text information for item representation to enhance Recommendations.Specifically,by aligning intuitive image information with text containing semantic features,we can more accurately represent items,improving item representation quality.We focus not only on item representations but also on user representations.To more precisely capture users’personalized preferences,we use traditional sequential recommendation models to train on users’historical interaction data,effectively capturing behavioral patterns.Finally,by combining LLMs and traditional sequential recommendation models,we allow the LLM to understand linguistic semantics while capturing collaborative semantics.Extensive evaluations on real-world datasets show that our model outperforms baseline methods,effectively combining user interaction history with item visual and textual modalities to provide personalized recommendations.
文摘Classifying job offers into occupational categories is a fundamental task in human resource information systems,as it improves and streamlines indexing,search,and matching between openings and job seekers.Comprehensive occupational databases such as O∗NET or ESCO provide detailed taxonomies of interrelated positions that can be leveraged to align the textual content of postings with occupational categories,thereby facilitating standardization,cross-system interoperability,and access to metadata for each occupation(e.g.,tasks,knowledge,skills,and abilities).In this work,we explore the effectiveness of fine-tuning existing language models(LMs)to classify job offers with occupational descriptors from O∗NET.This enables a more precise assessment of candidate suitability by identifying the specific knowledge and skills required for each position,and helps automate recruitment processes by mitigating human bias and subjectivity in candidate selection.We evaluate three representative BERT-like models:BERT,RoBERTa,and DeBERTa.BERT serves as the baseline encoder-only architecture;RoBERTa incorporates advances in pretraining objectives and data scale;and DeBERTa introduces architectural improvements through disentangled attention mechanisms.The best performance was achieved with the DeBERTa model,although the other models also produced strong results,and no statistically significant differences were observed acrossmodels.We also find that these models typically reach optimal performance after only a few training epochs,and that training with smaller,balanced datasets is effective.Consequently,comparable results can be obtained with models that require fewer computational resources and less training time,facilitating deployment and practical use.
基金supported by University Grant Agency of Matej Bel University in Banská Bystrica project number UGA-14-PDS-2025.
文摘It is known that correlation does not imply causality.Some relationships identified in the analysis of data are coincidental or unknown,and some are produced by real-world causality of the situation,which is problematic,since there is a need to differentiate between these two scenarios.Until recently,the proper−semantic−causality of the relationship could have been determined only by human experts from the area of expertise of the studied data.This has changed with the advance of large language models,which are often utilized as surrogates for such human experts,making the process automated and readily available to all data analysts.This motivates the main objective of this work,which is to introduce the design and implementation of a large language model-based semantic causality evaluator based on correlation analysis,together with its visual analysis model called Causal heatmap.After the implementation itself,the model is evaluated from the point of view of the quality of the visual model,from the point of view of the quality of causal evaluation based on large language models,and from the point of view of comparative analysis,while the results reached in the study highlight the usability of large language models in the task and the potential of the proposed approach in the analysis of unknown datasets.The results of the experimental evaluation demonstrate the usefulness of the Causal heatmap method,supported by the evident highlighting of interesting relationships,while suppressing irrelevant ones.
文摘Background:Assess ChatGPT and Bard's effectiveness in the initial identification of articles for Otolaryngology—Head and Neck Surgery systematic literature reviews.Methods:Three PRISMA-based systematic reviews(Jabbour et al.2017,Wong et al.2018,and Wu et al.2021)were replicated using ChatGPTv3.5 and Bard.Outputs(author,title,publication year,and journal)were compared to the original references and cross-referenced with medical databases for authenticity and recall.Results:Several themes emerged when comparing Bard and ChatGPT across the three reviews.Bard generated more outputs and had greater recall in Wong et al.'s review,with a broader date range in Jabbour et al.'s review.In Wu et al.'s review,ChatGPT-2 had higher recall and identified more authentic outputs than Bard-2.Conclusion:Large language models(LLMs)failed to fully replicate peer-reviewed methodologies,producing outputs with inaccuracies but identifying relevant,especially recent,articles missed by the references.While human-led PRISMA-based reviews remain the gold standard,refining LLMs for literature reviews shows potential.
基金funded by the Office of the Vice-President for Research and Development of Cebu Technological University.
文摘This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.
文摘Knowledge distillation has become a standard technique for compressing large language models into efficient student models,but existing methods often struggle to balance prediction accuracy with explanation quality.Recent approaches such as Distilling Step-by-Step(DSbS)introduce explanation supervision,yet they apply it in a uniform manner that may not fully exploit the different learning dynamics of prediction and explanation.In this work,we propose a task-structured curriculum learning(TSCL)framework that structures training into three sequential phases:(i)prediction-only,to establish stable feature representations;(ii)joint prediction-explanation,to align task outputs with rationale generation;and(iii)explanation-only,to refine the quality of rationales.This design provides a simple but effective modification to DSbS,requiring no architectural changes and adding negligible training cost.We justify the phase scheduling with ablation studies and convergence analysis,showing that an initial prediction-heavy stage followed by a balanced joint phase improves both stability and explanation alignment.Extensive experiments on five datasets(e-SNLI,ANLI,CommonsenseQA,SVAMP,and MedNLI)demonstrate that TSCL consistently outperforms strong baselines,achieving gains of+1.7-2.6 points in accuracy and 0.8-1.2 in ROUGE-L,corresponding to relative error reductions of up to 21%.Beyond lexical metrics,human evaluation and ERASERstyle faithfulness diagnostics confirm that TSCL produces more faithful and informative explanations.Comparative training curves further reveal faster convergence and lower variance across seeds.Efficiency analysis shows less than 3%overhead in wall-clock training time and no additional inference cost,making the approach practical for realworld deployment.This study demonstrates that a simple task-structured curriculum can significantly improve the effectiveness of knowledge distillation.By separating and sequencing objectives,TSCL achieves a better balance between accuracy,stability,and explanation quality.The framework generalizes across domains,including medical NLI,and offers a principled recipe for future applications in multimodal reasoning and reinforcement learning.
文摘Background:Despite the promise shown by large language models(LLMs)for standardized tasks,their multidimensional performance in real-world oncology decision-making remains unevaluated.This study aims to introduce a framework for evaluating LLMs and physician decisions in challenging lung cancer cases.Methods:We curated 50 challenging lung cancer cases(25 local and 25 published)classified as complex,rare,or refractory.Blinded three-dimensional,five-point Likert evaluations(1–5 for comprehensiveness,specificity,and readability)compared standalone LLMs(DeepSeek R1,Claude 3.5,Gemini 1.5,and GPT-4o),physicians by experience level(junior,intermediate,and senior),and AI-assisted juniors;intergroup differences and augmentation effects were analyzed statistically.Results:Of 50 challenging cases(18 complex,17 rare,and 15 refractory)rated by three experts,DeepSeek R1 achieved scores of 3.95±0.33,3.71±0.53,and 4.26±0.18 for comprehensiveness,specificity,and readability,respectively,positioning it between intermediate(3.68,3.68,3.75)and senior(4.50,4.64,4.53)physicians.GPT-4o and Claude 3.5 reached intermediate physician–level comprehensiveness(3.76±0.39,3.60±0.39)but junior-to-intermediate physician–level specificity(3.39±0.39,3.39±0.49).All LLMs scored higher on rare cases than intermediate physicians but fell below junior physicians in refractory-case specificity.AIassisted junior physicians showed marked gains in rare cases,with comprehensiveness rising from 2.32 to 4.29(84.8%),specificity from 2.24 to 4.26(90.8%),and readability from 2.76 to 4.59(66.0%),while specificity declined by 3.2%(3.17 to 3.07)in refractory cases.Error analysis showed complementary strengths,with physicians demonstrating reasoning stability and LLMs excelling in knowledge updating and risk management.Conclusions:LLMs performed variably in clinical decision-making tasks depending on case type,performing better in rare cases and worse in refractory cases requiring longitudinal reasoning.Complementary strengths between LLMs and physicians support case-and task-tailored human–AI collaboration.
基金supported by the National Science and Technology Council(NSTC),Taiwan,under grant number 114-2221-E-182-041-MY3by Chang Gung University and Chang Gung Memorial Hospital under project number NERPD4Q0021.
文摘The outstanding growth in the applications of large language models(LLMs)demonstrates the significance of adaptive and efficient prompt engineering tactics.The existing methods may not be variable,vigorous and streamlined in different domains.The offered study introduces an immediate optimization outline,named PROMPTx-PE,that is going to yield a greater level of precision and strength when it comes to the assignments that are premised on LLM.The proposed systemfeatures a timely selection schemewhich is informed by reinforcement learning,a contextual layer and a dynamic weighting module which is regulated by Lyapunov-based stability guidelines.The PROMPTx-PE dynamically varies the exploration and exploitation of the prompt space,depending on real-time feedback and multi-objective reward development.Extensive testing on both benchmark(GLUE,SuperGLUE)and domain-specific data(Healthcare-QA and Industrial-NER)demonstrates a large best performance to be 89.4%and a strong robustness disconnect with under 3%computation expense.The results confirm the effectiveness,consistency,and scalability of PROMPTx-PE as a platform of adaptive prompt engineering based on recent uses of LLMs.
文摘War rehearsals have become increasingly important in national security due to the growing complexity of international affairs.However,traditional rehearsal methods,such as military chess simulations,are inefficient and inflexible,with particularly pronounced limitations in command and decision-making.The overwhelming volume of information and high decision complexity hinder the realization of autonomous and agile command and control.To address this challenge,an intelligent warfare simulation framework named Command-Agent is proposed,which deeply integrates large language models(LLMs)with digital twin battlefields.By constructing a highly realistic battlefield environment through real-time simulation and multi-source data fusion,the natural language interaction capabilities of LLMs are leveraged to lower the command threshold and to enable autonomous command through the Observe-Orient-Decide-Act(OODA)feedback loop.Within the Command-Agent framework,a multimodel collaborative architecture is further adopted to decouple the decision-generation and command-execution functions of LLMs.By combining specialized models such as Deep Seek-R1 and MCTool,the limitations of single-model capabilities are overcome.MCTool is a lightweight execution model fine-tuned for military Function Calling tasks.The framework also introduces a Vector Knowledge Base to mitigate hallucinations commonly exhibited by LLMs.Experimental results demonstrate that Command-Agent not only enables natural language-driven simulation and control but also deeply understands commander intent.Leveraging the multi-model collaborative architecture,during red-blue UAV confrontations involving 2 to 8 UAVs,the integrated score is improved by an average of 41.8%compared to the single-agent system(MCTool),accompanied by a 161.8%optimization in the battle loss ratio.Furthermore,when compared with multi-agent systems lacking the knowledge base,the inclusion of the Vector Knowledge Base further improves overall performance by 16.8%.In comparison with the general model(Qwen2.5-7B),the fine-tuned MCTool leads by 5%in execution efficiency.Therefore,the proposed Command-Agent introduces a novel perspective to the military command system and offers a feasible solution for intelligent battlefield decision-making.
基金supported in part by the National Natural Science Foundation of China(NSFC)under Grant 62276109The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through the Research Group Project number(ORF-2025-585).
文摘The personalized fine-tuning of large languagemodels(LLMs)on edge devices is severely constrained by limited computation resources.Although split federated learning alleviates on-device burdens,its effectiveness diminishes in few-shot reasoning scenarios due to the low data efficiency of conventional supervised fine-tuning,which leads to excessive communication overhead.To address this,we propose Language-Empowered Split Fine-Tuning(LESFT),a framework that integrates split architectures with a contrastive-inspired fine-tuning paradigm.LESFT simultaneously learns frommultiple logically equivalent but linguistically diverse reasoning chains,providing richer supervisory signals and improving data efficiency.This process-oriented training allows more effective reasoning adaptation with fewer samples.Extensive experiments demonstrate that LESFT consistently outperforms strong baselines such as SplitLoRA in task accuracy.LESFT consistently outperforms strong baselines on GSM8K,CommonsenseQA,and AQUA_RAT,with the largest gains observed on Qwen2.5-3B.These results indicate that LESFT can effectively adapt large language models for reasoning tasks under the computational and communication constraints of edge environments.
基金supported by the National Natural Science Foundation of China(32471964)。
文摘The collection and annotation of lar ge-scale bird datasets are resource-intensive and time-consuming processes that significantly limit the scalability and accuracy of biodiversity monitoring systems.While self-supervised learning(SSL)has emerged as a promising approach for leveraging unannotated data,current SSL methods face two critical challenges in bird species recognition:(1)long-tailed data distributions that result in poor performance on underrepresented species;and(2)domain shift issues caused by data augmentation strategies designed to mitigate class imbalance.Here we present SDNet,a novel SSL-based bird recognition framework that integrates diffusion models with large language models(LLMs)to overcome these limitations.SDNet employs LLMs to generate semantically rich textual descriptions for tail-class species by prompting the models with species taxonomy,morphological attributes,and habitat information,producing detailed natural language priors that capture fine-grained visual characteristics(e.g.,plumage patterns,body proportions,and distinctive markings).These textual descriptions are subsequently used by a conditional diffusion model to synthesize new bird image samples through cross-attention mechanisms that fuse textual embeddings with intermediate visual feature representations during the denoising process,ensuring generated images preserve species-specific morphological details while maintaining photorealistic quality.Additionally,we incorporate a Swin Transformer as the feature extraction backbone whose hierarchical window-based attention mechanism and shifted windowing scheme enable multi-scale local feature extraction that proves particularly effective at capturing finegrained discriminative patterns(such as beak shape and feather texture)while mitigating domain shift between synthetic and original images through consistent feature representations across both data sources.SDNet is validated on both a self-constructed dataset(Bird_BXS)an d a publicly available benchmark(Birds_25),demonstrating substantial improvements over conventional SSL approaches.Our results indicate that the synergistic integration of LLMs,diffusion models,and the Swin Transformer architecture contributes significantly to recognition accuracy,particularly for rare and morphologically similar species.These findings highlight the potential of SDNet for addressing fundamental limitations of existing SSL methods in avian recognition tasks and establishing a new paradigm for efficient self-supervised learning in large-scale ornithological vision applications.
基金Supported by the High-Level Research Fund(12225000404)。
文摘API(Application Programming Interface)documentation often only describes individual APIs and lacks information on complex API relations and code examples.Retrieval-based and generation-based methods can both produce documentation that includes API relationship descriptions and code examples.However,they are limited by the richness of available API resources.As a result,they struggle to be effective when dealing with resource-scarce languages such as Kotlin.We propose an on-demand API tutorial generation method for resource-scarce languages,transferring API knowledge from a resource-rich language like Java to Kotlin using an AI chain.Evaluating our method on 500 Kotlin APIs,we generated more API documents than the state-of-the-art retrieval-based method ADECK and the generate-based method gDoc.The number of API guidelines generated by our method is 37 times that of ADECK and 1.6 times that of gDoc.Compared with the scheme that did not adopt the knowledge transfer strategy,the success rate of our method has increased by 31.25 percentage points.This demonstrates the feasibility and potential of using LLMs to create new API knowledge across languages.