With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more gener...With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more general constructs.Based on the profile mechanism of unified modeling language(UML) 2.2,a kind of DSML is presented to model simulation testing systems of avionic software(STSAS).To define the syntax,semantics and notions of the DSML,the domain model of the STSAS from which we generalize the domain concepts and relationships among these concepts is given,and then,the domain model is mapped into a UML meta-model,named UML-STSAS profile.Assuming a flight control system(FCS) as system under test(SUT),we design the relevant STSAS.The results indicate that extending UML to the simulation testing domain can effectively and precisely model STSAS.展开更多
Data-driven approaches are extensively employed to model complex chemical engineering processes, such as hydrotreating, to address the challenges of mechanism-based methods demanding deep process understanding. Howeve...Data-driven approaches are extensively employed to model complex chemical engineering processes, such as hydrotreating, to address the challenges of mechanism-based methods demanding deep process understanding. However, the development of such models requires specialized expertise in data science, limiting their broader application. Large language models (LLMs), such as GPT-4, have demonstrated potential in supporting and guiding research efforts. This work presents a novel AI-assisted framework where GPT-4, through well-engineered prompts, facilitates the construction and explanation of multi-objective neural networks. These models predict hydrotreating products properties (such as distillation range), including refined diesel and refined gas oil, using feedstock properties, operating conditions, and recycle hydrogen composition. Gradient-weighted class activation mapping was employed to identify key features influencing the output variables. This work illustrates an innovative AI-guided paradigm for chemical engineering applications, and the designed prompts hold promise for adaptation to other complex processes.展开更多
Dear Editor,This letter deals with automatically constructing an OPC UA information model(IM)aimed at enhancing data interoperability among heterogeneous system components within manufacturing automation systems.Empow...Dear Editor,This letter deals with automatically constructing an OPC UA information model(IM)aimed at enhancing data interoperability among heterogeneous system components within manufacturing automation systems.Empowered by the large language model(LLM),we propose a novel multi-agent collaborative framework to streamline the end-to-end OPC UA IM modeling process.Each agent is equipped with meticulously engineered prompt templates,augmenting their capacity to execute specific tasks.We conduct modeling experiments using real textual data to demonstrate the effectiveness of the proposed method,improving modeling efficiency and reducing the labor workload.展开更多
Recent advancements in large language models(LLMs)have driven remarkable progress in text process-ing,opening new avenues for medical knowledge discovery.In this study,we present ERQA,a mEdical knowledge Retrieval and...Recent advancements in large language models(LLMs)have driven remarkable progress in text process-ing,opening new avenues for medical knowledge discovery.In this study,we present ERQA,a mEdical knowledge Retrieval and Question-Answering framework powered by an enhanced LLM that integrates a semantic vector database and a curated literature repository.The ERQA framework leverages domain-specific incremental pretraining and conducts supervised fine-tuning on medical literature,enabling retrieval and question-answering(QA)tasks to be completed with high precision.Performance evaluations implemented on the coronavirus disease 2019(COVID-19)and TripClick data-sets demonstrate the robust capabilities of ERQA across multiple tasks.On the COVID-19 dataset,ERQA-13B achieves state-of-the-art retrieval metrics,with normalized discounted cumulative gain at top 10(NDCG@10)0.297,recall values at top 10(Recall@10)0.347,and mean reciprocal rank(MRR)=0.370;it also attains strong abstract summarization performance,with a recall-oriented understudy for gisting evaluation(ROUGE)-1 score of 0.434,and QA performance,with a bilingual evaluation understudy(BLEU)-1 score of 7.851.The comparable performance achieved on the TripClick dataset further under-scores the adaptability of ERQA across diverse medical topics.These findings suggest that ERQA repre-sents a significant step toward efficient biomedical knowledge retrieval and QA.展开更多
Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartph...Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.展开更多
Objective:Generative artificial intelligence(AI)technology,represented by large language models(LLMs),has gradually been developed for traditional Chinese medicine(TCM);however,challenges remain in effectively enhanci...Objective:Generative artificial intelligence(AI)technology,represented by large language models(LLMs),has gradually been developed for traditional Chinese medicine(TCM);however,challenges remain in effectively enhancing AI applications for TCM.Therefore,this study is the first systematic review to analyze LLMs in TCM retrospectively,focusing on and summarizing the evidence of their performance in generative tasks.Methods:We extensively searched electronic databases for articles published until June 2024 to identify publicly available studies on LLMs in TCM.Two investigators independently selected and extracted the related information and evaluation metrics.Based on the available data,this study used descriptive analysis for a comprehensive systematic review of LLM technology related to TCM.Results:Ten studies published between 2023 and 2024 met our eligibility criteria and were included in this review,including 40%LLMs in the TCM vertical domain,40%containing TCM data,and 20%honoring the TCM contribution,with a foundational model parameter range from 1.8 to 33 billion.All included studies used manual or automatic evaluation metrics to evaluate model performance and fully discussed the challenges and contributions through an overview of LLMs in TCM.Conclusions:LLMs have achieved significant advantages in TCM applications and can effectively address intelligent TCM tasks.Further in-depth development of LLMs is needed in various vertical TCM fields,including clinical and fundamental research.Focusing on the functional segmentation development direction of generative AI technologies in TCM application scenarios to meet the practical needs-oriented demands of TCM digitalization is essential.展开更多
In the era of big data,data-driven technologies are increasingly leveraged by industry to facilitate autonomous learning and intelligent decision-making.However,the challenge of“small samples in big data”emerges whe...In the era of big data,data-driven technologies are increasingly leveraged by industry to facilitate autonomous learning and intelligent decision-making.However,the challenge of“small samples in big data”emerges when datasets lack the comprehensive information necessary for addressing complex scenarios,which hampers adaptability.Thus,enhancing data completeness is essential.Knowledge-guided virtual sample generation transforms domain knowledge into extensive virtual datasets,thereby reducing dependence on limited real samples and enabling zero-sample fault diagnosis.This study used building air conditioning systems as a case study.We innovatively used the large language model(LLM)to acquire domain knowledge for sample generation,significantly lowering knowledge acquisition costs and establishing a generalized framework for knowledge acquisition in engineering applications.This acquired knowledge guided the design of diffusion boundaries in mega-trend diffusion(MTD),while the Monte Carlo method was used to sample within the diffusion function to create information-rich virtual samples.Additionally,a noise-adding technique was introduced to enhance the information entropy of these samples,thereby improving the robustness of neural networks trained with them.Experimental results showed that training the diagnostic model exclusively with virtual samples achieved an accuracy of 72.80%,significantly surpassing traditional small-sample supervised learning in terms of generalization.This underscores the quality and completeness of the generated virtual samples.展开更多
Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLM...Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.展开更多
This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to use...This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.展开更多
载人航天器研制过程中,人因要素在早期阶段融入设计仍有待提升,且常规的基于模型的系统工程(model based systems engineering,MBSE)体系缺少将人与系统其余部分进行整合的充分考虑,导致开发迭代周期变长,也大幅增加了研制成本。针对这...载人航天器研制过程中,人因要素在早期阶段融入设计仍有待提升,且常规的基于模型的系统工程(model based systems engineering,MBSE)体系缺少将人与系统其余部分进行整合的充分考虑,导致开发迭代周期变长,也大幅增加了研制成本。针对这一问题,提出载人月球探测任务人因领域元模型构建方法,在人-系统整合的框架下,采用MBSE将人因需求整合至载人航天器的开发过程中,并基于系统建模语言SysML建立人因领域元模型,以实现在载人月球探测产品开发的全生命周期中融入人因需求,为产品的规划、设计和开发提供支持,有效减少研制中出现人因设计问题,降低研制成本。通过载人月球探测任务的典型案例进行建模,验证人因领域元模型建立方法的有效性,为类似系统设计的MBSE扩展应用提供参考。展开更多
Whether or not a software system satisfies the anticipated user requirements is ultimately determined by the behaviors of the software. So it is necessary and valuable to research requirements modeling language and te...Whether or not a software system satisfies the anticipated user requirements is ultimately determined by the behaviors of the software. So it is necessary and valuable to research requirements modeling language and technique from the perspective of behavior. This paper presents a lightweight behavior based requirements modeling language BDL with formal syntax and semantics, and a general-purpose requirements description model BRM synthesizing the concepts of viewpoint and scenario. BRM is good for modeling large and complex system due to its structure is very clear. In addition, the modeling process is demonstrated through the case study On-Line Campus Management System. By lightweight formal style, BDL & BRM can effectively bridge the gap between practicability and rigorousness of formal requirements modeling language and technique.展开更多
A language model for information retrieval is built by using a query language model to generate queries and a document language model to generate documents. The documents are ranked according to the relative entropies...A language model for information retrieval is built by using a query language model to generate queries and a document language model to generate documents. The documents are ranked according to the relative entropies of estimated document language models with respect to the estimated query language model. Two popular and relatively efficient smoothing methods, the Jelinek- Mercer method and the absolute discounting method, are used to smooth the document language model in estimation of the document language, A combined model composed of the feedback document language model and the collection language model is used to estimate the query model. A performacne comparison between the new retrieval method and the existing method with feedback is made, and the retrieval performances of the proposed method with the two different smoothing techniques are evaluated on three Text Retrieval Conference (TREC) data sets. Experimental results show that the method is effective and performs better than the basic language modeling approach; moreover, the method using the Jelinek-Mercer technique performs better than that using the absolute discounting technique, and the perfomance is sensitive to the smoothing peramters.展开更多
Understanding language competition and extinction is an interdisciplinary challenge, and math models provide a tool for interpreting linguistic census data and possibly predict the language shift trend at the populati...Understanding language competition and extinction is an interdisciplinary challenge, and math models provide a tool for interpreting linguistic census data and possibly predict the language shift trend at the population scale. In this study, new data from previously examined areas were modeled, specifically Catalan and Spanish in Catalonia, Spanish and English in Houston, Texas, Dutch and French in Brussels, Euskera and Spanish in Spain and French and English in Canada. Three mathematical models of the language competition have been validated. The first is the Abrams-Strogatz model, which treats populations as having two monolingual groups. The second is the Castelló model, which considers bilingual speakers. The third is the Mira model, which considers language competition when the two languages have high similarities. It was found that the some of the data matched Abrams-Strogatz original model, but some divergences could still be addressed. It was also found that the Mira model needs some improvement in how it treats the differences between languages.展开更多
A mode of ontology-based information integration and management( OIIM) for testability scheme was proposed through expatiating on the connotation of the system testability scheme.Aiming at the complexity of influencin...A mode of ontology-based information integration and management( OIIM) for testability scheme was proposed through expatiating on the connotation of the system testability scheme.Aiming at the complexity of influencing factors in optimal design procedure of the testability scheme, the information of concept entities,concept attributions and concept relationships was analyzed and extracted,and then the testability scheme information ontology( TSIO) was built and coded via web ontology language( OWL).Based on the information ontology, the generalized model for testability scheme( GMTS) was founded by defining transformation rules. The primary study shows that the mode of OIIM for testability scheme can make up the deficiencies in knowledge representation and reasoning existing in traditional information models,and achieve the information share and reuse. It provides the effectual model basis for the optimal design of the testability scheme.展开更多
Domain-specific metamodeling language(DSMML) defined by informal method cannot strictly represent its structural semantics,so its properties such as consistency cannot be holistically and systematically verified.In re...Domain-specific metamodeling language(DSMML) defined by informal method cannot strictly represent its structural semantics,so its properties such as consistency cannot be holistically and systematically verified.In response,the paper proposes a formal representation of the structural semantics of DSMML named extensible markup language(XML) based metamodeling language(XMML) and its metamodels consistency verification method.Firstly,we describe our approach of formalization,based on this,the method of consistency verification of XMML and its metamodels based on first-order logical inference is presented;then,the formalization automatic mapping engine for metamodels is designed to show the feasibility of our formal method.展开更多
Language is a special social phenomenon and is always on the changing process with the development of society. During the evolving process of language, new language varieties will continuously emerge due to the change...Language is a special social phenomenon and is always on the changing process with the development of society. During the evolving process of language, new language varieties will continuously emerge due to the changes of some social and cultural factors. Cyber language is universally accepted as one type of the social language varieties. Basically, cyber language can be treated as a complex adaptive system which is influenced by the interaction between users’ cognition, social culture and the surrounding environments. Thus it is safe to say that cyber language is always undergoing a dynamic evolving process. With the usage-based language model as the theoretical foundation, this paper proposes a Complex Adaptive System (CAS) approach to analyze the expression of Appreciation to explore the complex, dynamic and nonlinear development of cyber language from the angle of meaning construction, grammaticalization and functional adaption respectively. It is found that the expression of Appreciation is experiencing adaptively a semantic connotations development and a process of grammatical functions expansion as well. This paper suggests that the emergence and development of cyber language is a novel and trendy social language phenomenon. Network language can achieve its process and evolution under the huge impact of social changes and social promotions. When faced with the changing surroundings, cyber language itself enjoys a timely adaption and responsive development to keep up with the new environments, which reflects the basic principle of language development, namely, language changes with the development of society.展开更多
This paper informally introduces colored object-oriented Petri Nets(COOPN) with the application of the AUV system.According to the characteristic of the AUV system's running environment,the object-oriented method ...This paper informally introduces colored object-oriented Petri Nets(COOPN) with the application of the AUV system.According to the characteristic of the AUV system's running environment,the object-oriented method is used in this paper not only to dispart system modules but also construct the refined running model of AUV system,then the colored Petri Net method is used to establish hierarchically detailed model in order to get the performance analyzing information of the system.After analyzing the model implementation,the errors of architecture designing and function realization can be found.If the errors can be modified on time,the experiment time in the pool can be reduced and the cost can be saved.展开更多
基金Aeronautical Science Foundation of China (20095551025)
文摘With direct expression of individual application domain patterns and ideas,domain-specific modeling language(DSML) is more and more frequently used to build models instead of using a combination of one or more general constructs.Based on the profile mechanism of unified modeling language(UML) 2.2,a kind of DSML is presented to model simulation testing systems of avionic software(STSAS).To define the syntax,semantics and notions of the DSML,the domain model of the STSAS from which we generalize the domain concepts and relationships among these concepts is given,and then,the domain model is mapped into a UML meta-model,named UML-STSAS profile.Assuming a flight control system(FCS) as system under test(SUT),we design the relevant STSAS.The results indicate that extending UML to the simulation testing domain can effectively and precisely model STSAS.
基金supported by the National Key Research and Development Program of China(2023YFA1507601)the National Natural Science Foundation of China(22278127,22378038)+2 种基金the Fundamental Research Funds for the Central Universities(2022ZFJH004)the Shanghai Pilot Program for Basic Research(22T01400100-18)the Natural Science Foundation of Liaoning Province,China(2024-MSBA-15).
文摘Data-driven approaches are extensively employed to model complex chemical engineering processes, such as hydrotreating, to address the challenges of mechanism-based methods demanding deep process understanding. However, the development of such models requires specialized expertise in data science, limiting their broader application. Large language models (LLMs), such as GPT-4, have demonstrated potential in supporting and guiding research efforts. This work presents a novel AI-assisted framework where GPT-4, through well-engineered prompts, facilitates the construction and explanation of multi-objective neural networks. These models predict hydrotreating products properties (such as distillation range), including refined diesel and refined gas oil, using feedstock properties, operating conditions, and recycle hydrogen composition. Gradient-weighted class activation mapping was employed to identify key features influencing the output variables. This work illustrates an innovative AI-guided paradigm for chemical engineering applications, and the designed prompts hold promise for adaptation to other complex processes.
基金supported supported by the Fundamental Research Funds for the Central Universities(226-2024-00004)the National Natural Science Foundation of China(U23 A20326)Key Research and Development Program of Zhejiang Province(2025C01061).
文摘Dear Editor,This letter deals with automatically constructing an OPC UA information model(IM)aimed at enhancing data interoperability among heterogeneous system components within manufacturing automation systems.Empowered by the large language model(LLM),we propose a novel multi-agent collaborative framework to streamline the end-to-end OPC UA IM modeling process.Each agent is equipped with meticulously engineered prompt templates,augmenting their capacity to execute specific tasks.We conduct modeling experiments using real textual data to demonstrate the effectiveness of the proposed method,improving modeling efficiency and reducing the labor workload.
基金supported by the Innovation Fund for Medical Sciences of the Chinese Academy of Medical Sciences(2021-I2M-1-033)the National Key Research and Development Program of China(2022YFF0711900).
文摘Recent advancements in large language models(LLMs)have driven remarkable progress in text process-ing,opening new avenues for medical knowledge discovery.In this study,we present ERQA,a mEdical knowledge Retrieval and Question-Answering framework powered by an enhanced LLM that integrates a semantic vector database and a curated literature repository.The ERQA framework leverages domain-specific incremental pretraining and conducts supervised fine-tuning on medical literature,enabling retrieval and question-answering(QA)tasks to be completed with high precision.Performance evaluations implemented on the coronavirus disease 2019(COVID-19)and TripClick data-sets demonstrate the robust capabilities of ERQA across multiple tasks.On the COVID-19 dataset,ERQA-13B achieves state-of-the-art retrieval metrics,with normalized discounted cumulative gain at top 10(NDCG@10)0.297,recall values at top 10(Recall@10)0.347,and mean reciprocal rank(MRR)=0.370;it also attains strong abstract summarization performance,with a recall-oriented understudy for gisting evaluation(ROUGE)-1 score of 0.434,and QA performance,with a bilingual evaluation understudy(BLEU)-1 score of 7.851.The comparable performance achieved on the TripClick dataset further under-scores the adaptability of ERQA across diverse medical topics.These findings suggest that ERQA repre-sents a significant step toward efficient biomedical knowledge retrieval and QA.
文摘Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.
基金supported by the National Multidisciplinary Innovation Team of Traditional Chinese Medicine(ZYYCXTD-D-202204)China Postdoctoral Science Foundation(2023M742627)+1 种基金Postdoctoral Fellowship Program of CPSF(GZC20231928)Foundation of State Key Laboratory of Component-based Chinese Medicine(CBCM2023201).
文摘Objective:Generative artificial intelligence(AI)technology,represented by large language models(LLMs),has gradually been developed for traditional Chinese medicine(TCM);however,challenges remain in effectively enhancing AI applications for TCM.Therefore,this study is the first systematic review to analyze LLMs in TCM retrospectively,focusing on and summarizing the evidence of their performance in generative tasks.Methods:We extensively searched electronic databases for articles published until June 2024 to identify publicly available studies on LLMs in TCM.Two investigators independently selected and extracted the related information and evaluation metrics.Based on the available data,this study used descriptive analysis for a comprehensive systematic review of LLM technology related to TCM.Results:Ten studies published between 2023 and 2024 met our eligibility criteria and were included in this review,including 40%LLMs in the TCM vertical domain,40%containing TCM data,and 20%honoring the TCM contribution,with a foundational model parameter range from 1.8 to 33 billion.All included studies used manual or automatic evaluation metrics to evaluate model performance and fully discussed the challenges and contributions through an overview of LLMs in TCM.Conclusions:LLMs have achieved significant advantages in TCM applications and can effectively address intelligent TCM tasks.Further in-depth development of LLMs is needed in various vertical TCM fields,including clinical and fundamental research.Focusing on the functional segmentation development direction of generative AI technologies in TCM application scenarios to meet the practical needs-oriented demands of TCM digitalization is essential.
基金supported by the National Natural Science Foundation of China(No.62306281)the Natural Science Foundation of Zhejiang Province(Nos.LQ23E060006 and LTGG24E050005)the Key Research Plan of Jiaxing City(No.2024BZ20016).
文摘In the era of big data,data-driven technologies are increasingly leveraged by industry to facilitate autonomous learning and intelligent decision-making.However,the challenge of“small samples in big data”emerges when datasets lack the comprehensive information necessary for addressing complex scenarios,which hampers adaptability.Thus,enhancing data completeness is essential.Knowledge-guided virtual sample generation transforms domain knowledge into extensive virtual datasets,thereby reducing dependence on limited real samples and enabling zero-sample fault diagnosis.This study used building air conditioning systems as a case study.We innovatively used the large language model(LLM)to acquire domain knowledge for sample generation,significantly lowering knowledge acquisition costs and establishing a generalized framework for knowledge acquisition in engineering applications.This acquired knowledge guided the design of diffusion boundaries in mega-trend diffusion(MTD),while the Monte Carlo method was used to sample within the diffusion function to create information-rich virtual samples.Additionally,a noise-adding technique was introduced to enhance the information entropy of these samples,thereby improving the robustness of neural networks trained with them.Experimental results showed that training the diagnostic model exclusively with virtual samples achieved an accuracy of 72.80%,significantly surpassing traditional small-sample supervised learning in terms of generalization.This underscores the quality and completeness of the generated virtual samples.
文摘Model evaluation using benchmark datasets is an important method to measure the capability of large language models(LLMs)in specific domains,and it is mainly used to assess the knowledge and reasoning abilities of LLMs.Therefore,in order to better assess the capability of LLMs in the agricultural domain,Agri-Eval was proposed as a benchmark for assessing the knowledge and reasoning ability of LLMs in agriculture.The assessment dataset used in Agri-Eval covered seven major disciplines in the agricultural domain:crop science,horticulture,plant protection,animal husbandry,forest science,aquaculture science,and grass science,and contained a total of 2283 questions.Among domestic general-purpose LLMs,DeepSeek R1 performed best with an accuracy rate of 75.49%.In the realm of international general-purpose LLMs,Gemini 2.0 pro exp 0205 standed out as the top performer,achieving an accuracy rate of 74.28%.As an LLMs in agriculture vertical,Shennong V2.0 outperformed all the LLMs in China,and the answer accuracy rate of agricultural knowledge exceeded that of all the existing general-purpose LLMs.The launch of Agri-Eval helped the LLM developers to comprehensively evaluate the model's capability in the field of agriculture through a variety of tasks and tests to promote the development of the LLMs in the field of agriculture.
基金funded by the Office of the Vice-President for Research and Development of Cebu Technological University.
文摘This study demonstrates a novel integration of large language models,machine learning,and multicriteria decision-making to investigate self-moderation in small online communities,a topic under-explored compared to user behavior and platform-driven moderation on social media.The proposed methodological framework(1)utilizes large language models for social media post analysis and categorization,(2)employs k-means clustering for content characterization,and(3)incorporates the TODIM(Tomada de Decisão Interativa Multicritério)method to determine moderation strategies based on expert judgments.In general,the fully integrated framework leverages the strengths of these intelligent systems in a more systematic evaluation of large-scale decision problems.When applied in social media moderation,this approach promotes nuanced and context-sensitive self-moderation by taking into account factors such as cultural background and geographic location.The application of this framework is demonstrated within Facebook groups.Eight distinct content clusters encompassing safety,harassment,diversity,and misinformation are identified.Analysis revealed a preference for content removal across all clusters,suggesting a cautious approach towards potentially harmful content.However,the framework also highlights the use of other moderation actions,like account suspension,depending on the content category.These findings contribute to the growing body of research on self-moderation and offer valuable insights for creating safer and more inclusive online spaces within smaller communities.
文摘载人航天器研制过程中,人因要素在早期阶段融入设计仍有待提升,且常规的基于模型的系统工程(model based systems engineering,MBSE)体系缺少将人与系统其余部分进行整合的充分考虑,导致开发迭代周期变长,也大幅增加了研制成本。针对这一问题,提出载人月球探测任务人因领域元模型构建方法,在人-系统整合的框架下,采用MBSE将人因需求整合至载人航天器的开发过程中,并基于系统建模语言SysML建立人因领域元模型,以实现在载人月球探测产品开发的全生命周期中融入人因需求,为产品的规划、设计和开发提供支持,有效减少研制中出现人因设计问题,降低研制成本。通过载人月球探测任务的典型案例进行建模,验证人因领域元模型建立方法的有效性,为类似系统设计的MBSE扩展应用提供参考。
文摘Whether or not a software system satisfies the anticipated user requirements is ultimately determined by the behaviors of the software. So it is necessary and valuable to research requirements modeling language and technique from the perspective of behavior. This paper presents a lightweight behavior based requirements modeling language BDL with formal syntax and semantics, and a general-purpose requirements description model BRM synthesizing the concepts of viewpoint and scenario. BRM is good for modeling large and complex system due to its structure is very clear. In addition, the modeling process is demonstrated through the case study On-Line Campus Management System. By lightweight formal style, BDL & BRM can effectively bridge the gap between practicability and rigorousness of formal requirements modeling language and technique.
基金The National Natural Science Founda-tion of China ( No. 60473004)the Science and ResearchFoundation Program of Henan University of Science and Tech-nology (No.2004ZY041)the Natural and Science FoundationProgram of the Education Department of Henan Province (No.200410464004)
文摘A language model for information retrieval is built by using a query language model to generate queries and a document language model to generate documents. The documents are ranked according to the relative entropies of estimated document language models with respect to the estimated query language model. Two popular and relatively efficient smoothing methods, the Jelinek- Mercer method and the absolute discounting method, are used to smooth the document language model in estimation of the document language, A combined model composed of the feedback document language model and the collection language model is used to estimate the query model. A performacne comparison between the new retrieval method and the existing method with feedback is made, and the retrieval performances of the proposed method with the two different smoothing techniques are evaluated on three Text Retrieval Conference (TREC) data sets. Experimental results show that the method is effective and performs better than the basic language modeling approach; moreover, the method using the Jelinek-Mercer technique performs better than that using the absolute discounting technique, and the perfomance is sensitive to the smoothing peramters.
文摘Understanding language competition and extinction is an interdisciplinary challenge, and math models provide a tool for interpreting linguistic census data and possibly predict the language shift trend at the population scale. In this study, new data from previously examined areas were modeled, specifically Catalan and Spanish in Catalonia, Spanish and English in Houston, Texas, Dutch and French in Brussels, Euskera and Spanish in Spain and French and English in Canada. Three mathematical models of the language competition have been validated. The first is the Abrams-Strogatz model, which treats populations as having two monolingual groups. The second is the Castelló model, which considers bilingual speakers. The third is the Mira model, which considers language competition when the two languages have high similarities. It was found that the some of the data matched Abrams-Strogatz original model, but some divergences could still be addressed. It was also found that the Mira model needs some improvement in how it treats the differences between languages.
文摘A mode of ontology-based information integration and management( OIIM) for testability scheme was proposed through expatiating on the connotation of the system testability scheme.Aiming at the complexity of influencing factors in optimal design procedure of the testability scheme, the information of concept entities,concept attributions and concept relationships was analyzed and extracted,and then the testability scheme information ontology( TSIO) was built and coded via web ontology language( OWL).Based on the information ontology, the generalized model for testability scheme( GMTS) was founded by defining transformation rules. The primary study shows that the mode of OIIM for testability scheme can make up the deficiencies in knowledge representation and reasoning existing in traditional information models,and achieve the information share and reuse. It provides the effectual model basis for the optimal design of the testability scheme.
基金the Yunnan Provincial Department of Education Research Fund Key Project(No.2011z025)General Project(No.2011y214)
文摘Domain-specific metamodeling language(DSMML) defined by informal method cannot strictly represent its structural semantics,so its properties such as consistency cannot be holistically and systematically verified.In response,the paper proposes a formal representation of the structural semantics of DSMML named extensible markup language(XML) based metamodeling language(XMML) and its metamodels consistency verification method.Firstly,we describe our approach of formalization,based on this,the method of consistency verification of XMML and its metamodels based on first-order logical inference is presented;then,the formalization automatic mapping engine for metamodels is designed to show the feasibility of our formal method.
文摘Language is a special social phenomenon and is always on the changing process with the development of society. During the evolving process of language, new language varieties will continuously emerge due to the changes of some social and cultural factors. Cyber language is universally accepted as one type of the social language varieties. Basically, cyber language can be treated as a complex adaptive system which is influenced by the interaction between users’ cognition, social culture and the surrounding environments. Thus it is safe to say that cyber language is always undergoing a dynamic evolving process. With the usage-based language model as the theoretical foundation, this paper proposes a Complex Adaptive System (CAS) approach to analyze the expression of Appreciation to explore the complex, dynamic and nonlinear development of cyber language from the angle of meaning construction, grammaticalization and functional adaption respectively. It is found that the expression of Appreciation is experiencing adaptively a semantic connotations development and a process of grammatical functions expansion as well. This paper suggests that the emergence and development of cyber language is a novel and trendy social language phenomenon. Network language can achieve its process and evolution under the huge impact of social changes and social promotions. When faced with the changing surroundings, cyber language itself enjoys a timely adaption and responsive development to keep up with the new environments, which reflects the basic principle of language development, namely, language changes with the development of society.
基金Supported by the Foundation of Harbin Engineering University Foundation under Grant No.HEUFT05035
文摘This paper informally introduces colored object-oriented Petri Nets(COOPN) with the application of the AUV system.According to the characteristic of the AUV system's running environment,the object-oriented method is used in this paper not only to dispart system modules but also construct the refined running model of AUV system,then the colored Petri Net method is used to establish hierarchically detailed model in order to get the performance analyzing information of the system.After analyzing the model implementation,the errors of architecture designing and function realization can be found.If the errors can be modified on time,the experiment time in the pool can be reduced and the cost can be saved.