Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods...Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.展开更多
Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representati...Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representation,modeling,fusion,computation,and storage.Within this framework,knowledge extraction,as the core component,directly determines KG quality.In military domains,traditional manual curation models face efficiency constraints due to data fragmentation,complex knowledge architectures,and confidentiality protocols.Meanwhile,crowdsourced ontology construction approaches from general domains prove non-transferable,while human-crafted ontologies struggle with generalization deficiencies.To address these challenges,this study proposes an OntologyAware LLM Methodology for Military Domain Knowledge Extraction(LLM-KE).This approach leverages the deep semantic comprehension capabilities of Large Language Models(LLMs)to simulate human experts’cognitive processes in crowdsourced ontology construction,enabling automated extraction of military textual knowledge.It concurrently enhances knowledge processing efficiency and improves KG completeness.Empirical analysis demonstrates that this method effectively resolves scalability and dynamic adaptation challenges in military KG construction,establishing a novel technological pathway for advancing military intelligence development.展开更多
The Heber River Diversion Dam (Heber Dam) and 3.6 km penstock on Vancouver island, British Columbia, Canada was built in 1953 and by 2009, it had reached the end of its useful life due to the deteriorated wooden str...The Heber River Diversion Dam (Heber Dam) and 3.6 km penstock on Vancouver island, British Columbia, Canada was built in 1953 and by 2009, it had reached the end of its useful life due to the deteriorated wooden structures. A decision was taken to remove the dam, return the flows in the Heber River to pre-dam conditions and restore the footprint of the dam and penstock. Plans were developed for removal of the dam and contaminated materials including the creosote coated wooden penstock and other wooden structures associated with the dam and site restoration. Work on removal and restoration was undertaken over the summer and fall of 2012 and the spring of 2013. Restoration treatments were based on the use of natural processes as a model for recovery. The recovery of dam and penstock removal disturbances was initiated in the late summer and fall of 2012 with the fall dispersal of seeds from mature pioneering species that formed a significant part of the local undisturbed vegetation. This paper describes the treatments that were applied to enhance the natural recovery of the disturbed areas and the results of those treatments. The restoration treatments were designed to address the filters that were present in project areas. These were identified during an initial inspection in 2009 and were centred on compaction of substrates and a lack of micro-sites. In addition to the use of natural processes for the restoration of project disturbances, a local First Nations crew was hired to transplant sword ferns (Polystichum munitum (Kaulf.) C. Presl) from the adjacent forest areas onto project sites to provide a social benefit from the restoration work.展开更多
Information extraction(IE)aims to automatically identify and extract information about specific interests from raw texts.Despite the abundance of solutions based on fine-tuning pretrained language models,IE in the con...Information extraction(IE)aims to automatically identify and extract information about specific interests from raw texts.Despite the abundance of solutions based on fine-tuning pretrained language models,IE in the context of fewshot and zero-shot scenarios remains highly challenging due to the scarcity of training data.Large language models(LLMs),on the other hand,can generalize well to unseen tasks with few-shot demonstrations or even zero-shot instructions and have demonstrated impressive ability for a wide range of natural language understanding or generation tasks.Nevertheless,it is unclear,whether such effectiveness can be replicated in the task of IE,where the target tasks involve specialized schema and quite abstractive entity or relation concepts.In this paper,we first examine the validity of LLMs in executing IE tasks with an established prompting strategy and further propose multiple types of augmented prompting methods,including the structured fundamental prompt(SFP),the structured interactive reasoning prompt(SIRP),and the voting-enabled structured interactive reasoning prompt(VESIRP).The experimental results demonstrate that while directly promotes inferior performance,the proposed augmented prompt methods significantly improve the extraction accuracy,achieving comparable or even better performance(e.g.,zero-shot FewNERD,FewNERD-INTRA)than state-of-theart methods that require large-scale training samples.This study represents a systematic exploration of employing instruction-following LLM for the task of IE.It not only establishes a performance benchmark for this novel paradigm but,more importantly,validates a practical technical pathway through the proposed prompt enhancement method,offering a viable solution for efficient IE in low-resource settings.展开更多
This editorial explores the transformative potential of artificial intelligence(AI)in identifying conflicts of interest(COIs)within academic and scientific research.By harnessing advanced data analysis,pattern recogni...This editorial explores the transformative potential of artificial intelligence(AI)in identifying conflicts of interest(COIs)within academic and scientific research.By harnessing advanced data analysis,pattern recognition,and natural language processing techniques,AI offers innovative solutions for enhancing transparency and integrity in research.This editorial discusses how AI can automatically detect COIs,integrate data from various sources,and streamline reporting processes,thereby maintaining the credibility of scientific findings.展开更多
DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without...DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without supervised fine-tuning as a preliminary step,demonstrates remarkable reasoning capabilities of performing a wide range of tasks.DeepSeek is a prominent AI-driven chatbot that assists individuals in learning and enhances responses by generating insightful solutions to inquiries.Users possess divergent viewpoints regarding advanced models like DeepSeek,posting both their merits and shortcomings across several social media platforms.This research presents a new framework for predicting public sentiment to evaluate perceptions of DeepSeek.To transform the unstructured data into a suitable manner,we initially collect DeepSeek-related tweets from Twitter and subsequently implement various preprocessing methods.Subsequently,we annotated the tweets utilizing the Valence Aware Dictionary and sentiment Reasoning(VADER)methodology and the lexicon-driven TextBlob.Next,we classified the attitudes obtained from the purified data utilizing the proposed hybrid model.The proposed hybrid model consists of long-term,shortterm memory(LSTM)and bidirectional gated recurrent units(BiGRU).To strengthen it,we include multi-head attention,regularizer activation,and dropout units to enhance performance.Topic modeling employing KMeans clustering and Latent Dirichlet Allocation(LDA),was utilized to analyze public behavior concerning DeepSeek.The perceptions demonstrate that 82.5%of the people are positive,15.2%negative,and 2.3%neutral using TextBlob,and 82.8%positive,16.1%negative,and 1.2%neutral using the VADER analysis.The slight difference in results ensures that both analyses concur with their overall perceptions and may have distinct views of language peculiarities.The results indicate that the proposed model surpassed previous state-of-the-art approaches.展开更多
Natural language processing(NLP)technologies,such as ChatGPT,are revolutionizing various fields,including finance research.This article explores the potential of Chat-GPT as a transformative tool for finance researche...Natural language processing(NLP)technologies,such as ChatGPT,are revolutionizing various fields,including finance research.This article explores the potential of Chat-GPT as a transformative tool for finance researchers.We illustrate various applications of ChatGPT in finance research,from analyzing financial charts and providing coding support to the theoretical derivation of financial models.Significant advances in multimodal learning,such as Visual Referring Prompting(VRP),are also explored for their potential to enhance ChatGPT’s image analysis capabilities.Furthermore,we conduct a comparative analysis of ChatGPT-3.5,ChatGPT-4,and Microsoft Bing to examine their distinct features,strengths,and weaknesses to provide valuable insights into their applicability in finance research.We demonstrate the innovative opportunities and insights provided by the development of ChatGPT to enrich the financial research process.By addressing the potential pitfalls and ethical considerations associated with using ChatGPT,we aim to promote responsible AI adoption and a more indepth understanding of the role of advanced NLP technologies in shaping the future of finance research and practice.Overall,this paper underscores ChatGPT’s transformative role in finance research,detailing its applications,benefits,and challenges,and advocating for ethical AI adoption to shape the future of the field.展开更多
Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its ...Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its wide range of dialect variations.Te availability of annotated datasets is limited,and preprocessing of the noisy content is even more challenging,sometimes resulting in the removal of important cues of sentiment from the input.To overcome such problems,this study investigates the applicability of using transfer learning based on pre-trained transformer models to classify sentiment in Arabic texts with high accuracy.Specifcally,it uses the CAMeLBERT model fnetuned for the Multi-Domain Arabic Resources for Sentiment Analysis(MARSA)dataset containing more than 56,000 manually annotated tweets annotated across political,social,sports,and technology domains.Te proposed method avoids extensive use of preprocessing and shows that raw data provides better results because they tend to retain more linguistic features.Te fne-tuned CAMeLBERT model produces state-of-the-art accuracy of 92%,precision of 91.7%,recall of 92.3%,and F1-score of 91.5%,outperforming standard machine learning models and ensemble-based/deep learning techniques.Our performance comparisons against other pre-trained models,namely AraBERTv02-twitter and MARBERT,show that transformer-based architectures are consistently the best suited when dealing with noisy Arabic texts.Tis work leads to a strong remedy for the problems in Arabic sentiment analysis and provides recommendations on easy tuning of the pre-trained models to adapt to challenging linguistic features and domain-specifc tasks.展开更多
This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electroca...This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies.展开更多
The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the si...The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the sig-nificant potential of natural language processing(NLP)to analyze unstructured human language during disasters,thereby facilitating the uncovering of disruptions and providing situational awareness supporting various aspects of resilience regarding CISs.Despite this potential,few studies have systematically mapped the global research on NLP applications with respect to supporting various aspects of resilience of CISs.This paper contributes to the body of knowledge by presenting a review of current knowledge using the scientometric review technique.Using 231 bibliographic records from the Scopus and Web of Science core collections,we identify five key research areas where researchers have used NLP to support the resilience of CISs during natural disasters,including sentiment analysis,crisis informatics,data and knowledge visualization,disaster impacts,and content analysis.Furthermore,we map the utility of NLP in the identified research focus with respect to four aspects of resilience(i.e.,preparedness,absorption,recovery,and adaptability)and present various common techniques used and potential future research directions.This review highlights that NLP has the potential to become a supplementary data source to support the resilience of CISs.The results of this study serve as an introductory-level guide designed to help scholars and practitioners unlock the potential of NLP for strengthening the resilience of CISs against natural disasters.展开更多
Grassland degradation presents overwhelming challenges to biodiversity,ecosystem services,and the socioeconomic sustainability of dependent communities.However,a comprehensive synthesis of global knowledge on the fron...Grassland degradation presents overwhelming challenges to biodiversity,ecosystem services,and the socioeconomic sustainability of dependent communities.However,a comprehensive synthesis of global knowledge on the frontiers and key areas of grassland degradation research has not been achieved due to the limitations of traditional scientometrics methods.The present synthesis of information employed BERTopic,an advanced natural language processing tool,to analyze the extensive ecological literature on grassland degradation.We compiled a dataset of 4,504 publications from the Web of Science core collection database and used it to evaluate the geographic distribution and temporal evolution of different grassland types and available knowledge on the subject.Our analysis identified key topics in the global grassland degradation research domain,including the effects of grassland degradation on ecosystem functions,grassland ecological restoration and biodiversity conservation,erosion processes and hydrological models in grasslands,and others.The BERTopic analysis significantly outperforms traditional methods in identifying complex and evolving topics in large datasets of literature.Compared to traditional scientometrics analysis,BERTopic provides a more comprehensive perspective on the research areas,revealing not only popular topics but also emerging research areas that traditional methods may overlook,although scientometrics offers more specificity and detail.Therefore,we argue for the simultaneous use of both approaches to achieve more systematic and comprehensive assessments of specific research areas.This study represents an emerging application of BERTopic algorithms in ecological research,particularly in the critical research focused on global grassland degradation.It also highlights the need for integrating advanced computational methods in ecological research in this era of data explosion.Tools like the BERTopic algorithm are essential for enhancing our understanding of complex environmental problems,and it marks an important stride towards more sophisticated,data-driven analysis in ecology.展开更多
The increasing fluency of advanced language models,such as GPT-3.5,GPT-4,and the recently introduced DeepSeek,challenges the ability to distinguish between human-authored and AI-generated academic writing.This situati...The increasing fluency of advanced language models,such as GPT-3.5,GPT-4,and the recently introduced DeepSeek,challenges the ability to distinguish between human-authored and AI-generated academic writing.This situation is raising significant concerns regarding the integrity and authenticity of academic work.In light of the above,the current research evaluates the effectiveness of Bidirectional Long Short-TermMemory(BiLSTM)networks enhanced with pre-trained GloVe(Global Vectors for Word Representation)embeddings to detect AIgenerated scientific Abstracts drawn from the AI-GA(Artificial Intelligence Generated Abstracts)dataset.Two core BiLSTM variants were assessed:a single-layer approach and a dual-layer design,each tested under static or adaptive embeddings.The single-layer model achieved nearly 97%accuracy with trainable GloVe,occasionally surpassing the deeper model.Despite these gains,neither configuration fully matched the 98.7%benchmark set by an earlier LSTMWord2Vec pipeline.Some runs were over-fitted when embeddings were fine-tuned,whereas static embeddings offered a slightly lower yet stable accuracy of around 96%.This lingering gap reinforces a key ethical and procedural concern:relying solely on automated tools,such as Turnitin’s AI-detection features,to penalize individuals’risks and unjust outcomes.Misclassifications,whether legitimate work is misread as AI-generated or engineered text,evade detection,demonstrating that these classifiers should not stand as the sole arbiters of authenticity.Amore comprehensive approach is warranted,one which weaves model outputs into a systematic process supported by expert judgment and institutional guidelines designed to protect originality.展开更多
As a result of breakthroughs in computational approaches mixed with a boom in multi-omics data,the development of numerous digital medicines and bioinformatics tools have aided in speeding up the healthcare industry p...As a result of breakthroughs in computational approaches mixed with a boom in multi-omics data,the development of numerous digital medicines and bioinformatics tools have aided in speeding up the healthcare industry process.The traditional healthcare development method has been further rationalized with the introduction of artificial intelligence(AI),deep learning(DL),and machine learning(ML).Wide-ranging biological and clinical data in the form of big data,which is stored in various databases worldwide,serve as the raw material for AI-based methods and aid in the precise identification of patterns and models.These patterns and models can be used to identify novel therapeutically active molecules with significantly less time,financial investment,and workforce.This review article provides insights into understanding the principles of AI technologies such as next-generation sequencing(NGS),natural language processing(NLP),radiological images,patients-electronic medical records(EMR),and drug discovery as well as how they should be used in ethical,economic,and social ramifications of AI.This review also highlights various applications of AI in the healthcare industry,along with the analyses of different AI technologies.Additionally,it will offer helpful suggestions to assist decision-makers in creating an AI plan that would support their shift to a digital healthcare system.展开更多
Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have bee...Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have been proposed,most of them focus on recognizing printed Urdu characters and digits.To the best of our knowledge,very little research has focused solely on Urdu pure handwriting recognition,and the results of such proposed methods are often inadequate.In this study,we introduce a novel approach to recognizing Urdu pure handwritten digits and characters using Convolutional Neural Networks(CNN).Our proposed method utilizes convolutional layers to extract important features from input images and classifies them using fully connected layers,enabling efficient and accurate detection of Urdu handwritten digits and characters.We implemented the proposed technique on a large publicly available dataset of Urdu handwritten digits and characters.The findings demonstrate that the CNN model achieves an accuracy of 98.30%and an F1 score of 88.6%,indicating its effectiveness in detecting and classifyingUrdu handwritten digits and characters.These results have far-reaching implications for various applications,including document analysis,text recognition,and language understanding,which have previously been unexplored in the context of Urdu handwriting data.This work lays a solid foundation for future research and development in Urdu language detection and processing,opening up new opportunities for advancement in this field.展开更多
Image captioning has seen significant research efforts over the last decade.The goal is to generate meaningful semantic sentences that describe visual content depicted in photographs and are syntactically accurate.Man...Image captioning has seen significant research efforts over the last decade.The goal is to generate meaningful semantic sentences that describe visual content depicted in photographs and are syntactically accurate.Many real-world applications rely on image captioning,such as helping people with visual impairments to see their surroundings.To formulate a coherent and relevant textual description,computer vision techniques are utilized to comprehend the visual content within an image,followed by natural language processing methods.Numerous approaches and models have been developed to deal with this multifaceted problem.Several models prove to be stateof-the-art solutions in this field.This work offers an exclusive perspective emphasizing the most critical strategies and techniques for enhancing image caption generation.Rather than reviewing all previous image captioning work,we analyze various techniques that significantly improve image caption generation and achieve significant performance improvements,including encompassing image captioning with visual attention methods,exploring semantic information types in captions,and employing multi-caption generation techniques.Further,advancements such as neural architecture search,few-shot learning,multi-phase learning,and cross-modal embedding within image caption networks are examined for their transformative effects.The comprehensive quantitative analysis conducted in this study identifies cutting-edgemethodologies and sheds light on their profound impact,driving forward the forefront of image captioning technology.展开更多
This study examines the role of village regulations within China’s Litigation Source Governance(LSG)framework,specifically analyzing Tianjin Municipality’s 2023 Model Village Regulations.Employing legal analysis and...This study examines the role of village regulations within China’s Litigation Source Governance(LSG)framework,specifically analyzing Tianjin Municipality’s 2023 Model Village Regulations.Employing legal analysis and Natural Language Processing(NLP)techniques,the research evaluates the effectiveness,enforceability,and thematic orientation of these regulations in grassroots dispute resolution.Findings reveal a pronounced reliance on moral governance provisions,limited judicial recognition,and significant implementation challenges due to the predominance of non-binding(soft)clauses.The study recommends enhancing judicial recognition through formal confirmation mechanisms,increasing legally binding clauses,and integrating village-level governance more closely with formal judicial processes.This approach not only strengthens local governance but also provides valuable insights for nationwide replication,supporting broader goals of rural stability and governance modernization.展开更多
X(formerly known as Twitter)is one of the most prominent social media platforms,enabling users to share short messages(tweets)with the public or their followers.It serves various purposes,from real-time news dissemina...X(formerly known as Twitter)is one of the most prominent social media platforms,enabling users to share short messages(tweets)with the public or their followers.It serves various purposes,from real-time news dissemination and political discourse to trend spotting and consumer engagement.X has emerged as a key space for understanding shifting brand perceptions,consumer preferences,and product-related sentiment in the fashion industry.However,the platform’s informal,dynamic,and context-dependent language poses substantial challenges for sentiment analysis,mainly when attempting to detect sarcasm,slang,and nuanced emotional tones.This study introduces a hybrid deep learning framework that integrates Transformer encoders,recurrent neural networks(i.e.,Long Short-Term Memory(LSTM)and Gated Recurrent Unit(GRU)),and attention mechanisms to improve the accuracy of fashion-related sentiment classification.These methods were selected due to their proven strength in capturing both contextual dependencies and sequential structures,which are essential for interpreting short-form text.Our model was evaluated on a dataset of 20,000 fashion tweets.The experimental results demonstrate a classification accuracy of 92.25%,outperforming conventional models such as Logistic Regression,Linear Support Vector Machine(SVM),and even standalone LSTM by a margin of up to 8%.This improvement highlights the importance of hybrid architectures in handling noisy,informal social media data.This study’s findings offer strong implications for digital marketing and brand management,where timely sentiment detection is critical.Despite the promising results,challenges remain regarding the precise identification of negative sentiments,indicating that further work is needed to detect subtle and contextually embedded expressions.展开更多
The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children a...The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes.展开更多
Recent advances in contrastive language-image pretraining(CLIP)models and generative AI have demonstrated significant capabilities in cross-modal understanding and content generation.Based on these developments,this s...Recent advances in contrastive language-image pretraining(CLIP)models and generative AI have demonstrated significant capabilities in cross-modal understanding and content generation.Based on these developments,this study introduces a novel framework for airfoil design via natural language interfaces.To the authors’knowledge,this study establishes the first end-to-end,bidirectional mapping between textual descriptions(e.g.,“low-drag supercritical wing for transonic conditions”)and parametric airfoil geometries represented by class-shape transformation parameters.The proposed approach integrates a CLIP-inspired architecture that aligns text embeddings with airfoil parameter spaces through contrastive learning,along with a semantically conditioned decoder that produces physically plausible airfoil geometries from latent representations.The experimental results validate the framework’s ability to generate aerodynamically plausible airfoils from natural language specifications and to classify airfoils accurately based on given textual labels.This research reduces the expertise threshold for preliminary airfoil design and highlights the potential for human-AI collaboration in aerospace engineering.展开更多
Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasi...Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasingly rely on large-scale social media data to explore public discourse,collective behavior,and emerging social concerns.However,traditional models like Latent Dirichlet Allocation(LDA)and neural topic models like BERTopic struggle to capture deep semantic structures in short-text datasets,especially in complex non-English languages like Chinese.This paper presents Generative Language Model Topic(GLMTopic)a novel hybrid topic modeling framework leveraging the capabilities of large language models,designed to support social science research by uncovering coherent and interpretable themes from Chinese social media platforms.GLMTopic integrates Adaptive Community-enhanced Graph Embedding for advanced semantic representation,Uniform Manifold Approximation and Projection-based(UMAP-based)dimensionality reduction,Hierarchical Density-Based Spatial Clustering of Applications with Noise(HDBSCAN)clustering,and large language model-powered(LLM-powered)representation tuning to generate more contextually relevant and interpretable topics.By reducing dependence on extensive text preprocessing and human expert intervention in post-analysis topic label annotation,GLMTopic facilitates a fully automated and user-friendly topic extraction process.Experimental evaluations on a social media dataset sourced from Weibo demonstrate that GLMTopic outperforms Latent Dirichlet Allocation(LDA)and BERTopic in coherence score and usability with automated interpretation,providing a more scalable and semantically accurate solution for Chinese topic modeling.Future research will explore optimizing computational efficiency,integrating knowledge graphs and sentiment analysis for more complicated workflows,and extending the framework for real-time and multilingual topic modeling.展开更多
基金supported by the project“Romanian Hub for Artificial Intelligence-HRIA”,Smart Growth,Digitization and Financial Instruments Program,2021–2027,MySMIS No.334906.
文摘Objective expertise evaluation of individuals,as a prerequisite stage for team formation,has been a long-term desideratum in large software development companies.With the rapid advancements in machine learning methods,based on reliable existing data stored in project management tools’datasets,automating this evaluation process becomes a natural step forward.In this context,our approach focuses on quantifying software developer expertise by using metadata from the task-tracking systems.For this,we mathematically formalize two categories of expertise:technology-specific expertise,which denotes the skills required for a particular technology,and general expertise,which encapsulates overall knowledge in the software industry.Afterward,we automatically classify the zones of expertise associated with each task a developer has worked on using Bidirectional Encoder Representations from Transformers(BERT)-like transformers to handle the unique characteristics of project tool datasets effectively.Finally,our method evaluates the proficiency of each software specialist across already completed projects from both technology-specific and general perspectives.The method was experimentally validated,yielding promising results.
文摘Since Google introduced the concept of Knowledge Graphs(KGs)in 2012,their construction technologies have evolved into a comprehensive methodological framework encompassing knowledge acquisition,extraction,representation,modeling,fusion,computation,and storage.Within this framework,knowledge extraction,as the core component,directly determines KG quality.In military domains,traditional manual curation models face efficiency constraints due to data fragmentation,complex knowledge architectures,and confidentiality protocols.Meanwhile,crowdsourced ontology construction approaches from general domains prove non-transferable,while human-crafted ontologies struggle with generalization deficiencies.To address these challenges,this study proposes an OntologyAware LLM Methodology for Military Domain Knowledge Extraction(LLM-KE).This approach leverages the deep semantic comprehension capabilities of Large Language Models(LLMs)to simulate human experts’cognitive processes in crowdsourced ontology construction,enabling automated extraction of military textual knowledge.It concurrently enhances knowledge processing efficiency and improves KG completeness.Empirical analysis demonstrates that this method effectively resolves scalability and dynamic adaptation challenges in military KG construction,establishing a novel technological pathway for advancing military intelligence development.
文摘The Heber River Diversion Dam (Heber Dam) and 3.6 km penstock on Vancouver island, British Columbia, Canada was built in 1953 and by 2009, it had reached the end of its useful life due to the deteriorated wooden structures. A decision was taken to remove the dam, return the flows in the Heber River to pre-dam conditions and restore the footprint of the dam and penstock. Plans were developed for removal of the dam and contaminated materials including the creosote coated wooden penstock and other wooden structures associated with the dam and site restoration. Work on removal and restoration was undertaken over the summer and fall of 2012 and the spring of 2013. Restoration treatments were based on the use of natural processes as a model for recovery. The recovery of dam and penstock removal disturbances was initiated in the late summer and fall of 2012 with the fall dispersal of seeds from mature pioneering species that formed a significant part of the local undisturbed vegetation. This paper describes the treatments that were applied to enhance the natural recovery of the disturbed areas and the results of those treatments. The restoration treatments were designed to address the filters that were present in project areas. These were identified during an initial inspection in 2009 and were centred on compaction of substrates and a lack of micro-sites. In addition to the use of natural processes for the restoration of project disturbances, a local First Nations crew was hired to transplant sword ferns (Polystichum munitum (Kaulf.) C. Presl) from the adjacent forest areas onto project sites to provide a social benefit from the restoration work.
基金supported by the National Natural Science Foundation of China(62222212).
文摘Information extraction(IE)aims to automatically identify and extract information about specific interests from raw texts.Despite the abundance of solutions based on fine-tuning pretrained language models,IE in the context of fewshot and zero-shot scenarios remains highly challenging due to the scarcity of training data.Large language models(LLMs),on the other hand,can generalize well to unseen tasks with few-shot demonstrations or even zero-shot instructions and have demonstrated impressive ability for a wide range of natural language understanding or generation tasks.Nevertheless,it is unclear,whether such effectiveness can be replicated in the task of IE,where the target tasks involve specialized schema and quite abstractive entity or relation concepts.In this paper,we first examine the validity of LLMs in executing IE tasks with an established prompting strategy and further propose multiple types of augmented prompting methods,including the structured fundamental prompt(SFP),the structured interactive reasoning prompt(SIRP),and the voting-enabled structured interactive reasoning prompt(VESIRP).The experimental results demonstrate that while directly promotes inferior performance,the proposed augmented prompt methods significantly improve the extraction accuracy,achieving comparable or even better performance(e.g.,zero-shot FewNERD,FewNERD-INTRA)than state-of-theart methods that require large-scale training samples.This study represents a systematic exploration of employing instruction-following LLM for the task of IE.It not only establishes a performance benchmark for this novel paradigm but,more importantly,validates a practical technical pathway through the proposed prompt enhancement method,offering a viable solution for efficient IE in low-resource settings.
文摘This editorial explores the transformative potential of artificial intelligence(AI)in identifying conflicts of interest(COIs)within academic and scientific research.By harnessing advanced data analysis,pattern recognition,and natural language processing techniques,AI offers innovative solutions for enhancing transparency and integrity in research.This editorial discusses how AI can automatically detect COIs,integrate data from various sources,and streamline reporting processes,thereby maintaining the credibility of scientific findings.
文摘DeepSeek Chinese artificial intelligence(AI)open-source model,has gained a lot of attention due to its economical training and efficient inference.DeepSeek,a model trained on large-scale reinforcement learning without supervised fine-tuning as a preliminary step,demonstrates remarkable reasoning capabilities of performing a wide range of tasks.DeepSeek is a prominent AI-driven chatbot that assists individuals in learning and enhances responses by generating insightful solutions to inquiries.Users possess divergent viewpoints regarding advanced models like DeepSeek,posting both their merits and shortcomings across several social media platforms.This research presents a new framework for predicting public sentiment to evaluate perceptions of DeepSeek.To transform the unstructured data into a suitable manner,we initially collect DeepSeek-related tweets from Twitter and subsequently implement various preprocessing methods.Subsequently,we annotated the tweets utilizing the Valence Aware Dictionary and sentiment Reasoning(VADER)methodology and the lexicon-driven TextBlob.Next,we classified the attitudes obtained from the purified data utilizing the proposed hybrid model.The proposed hybrid model consists of long-term,shortterm memory(LSTM)and bidirectional gated recurrent units(BiGRU).To strengthen it,we include multi-head attention,regularizer activation,and dropout units to enhance performance.Topic modeling employing KMeans clustering and Latent Dirichlet Allocation(LDA),was utilized to analyze public behavior concerning DeepSeek.The perceptions demonstrate that 82.5%of the people are positive,15.2%negative,and 2.3%neutral using TextBlob,and 82.8%positive,16.1%negative,and 1.2%neutral using the VADER analysis.The slight difference in results ensures that both analyses concur with their overall perceptions and may have distinct views of language peculiarities.The results indicate that the proposed model surpassed previous state-of-the-art approaches.
文摘Natural language processing(NLP)technologies,such as ChatGPT,are revolutionizing various fields,including finance research.This article explores the potential of Chat-GPT as a transformative tool for finance researchers.We illustrate various applications of ChatGPT in finance research,from analyzing financial charts and providing coding support to the theoretical derivation of financial models.Significant advances in multimodal learning,such as Visual Referring Prompting(VRP),are also explored for their potential to enhance ChatGPT’s image analysis capabilities.Furthermore,we conduct a comparative analysis of ChatGPT-3.5,ChatGPT-4,and Microsoft Bing to examine their distinct features,strengths,and weaknesses to provide valuable insights into their applicability in finance research.We demonstrate the innovative opportunities and insights provided by the development of ChatGPT to enrich the financial research process.By addressing the potential pitfalls and ethical considerations associated with using ChatGPT,we aim to promote responsible AI adoption and a more indepth understanding of the role of advanced NLP technologies in shaping the future of finance research and practice.Overall,this paper underscores ChatGPT’s transformative role in finance research,detailing its applications,benefits,and challenges,and advocating for ethical AI adoption to shape the future of the field.
基金funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)(grant number IMSIU-DDRSP2504).
文摘Dialectal Arabic text classifcation(DA-TC)provides a mechanism for performing sentiment analysis on recent Arabic social media leading to many challenges owing to the natural morphology of the Arabic language and its wide range of dialect variations.Te availability of annotated datasets is limited,and preprocessing of the noisy content is even more challenging,sometimes resulting in the removal of important cues of sentiment from the input.To overcome such problems,this study investigates the applicability of using transfer learning based on pre-trained transformer models to classify sentiment in Arabic texts with high accuracy.Specifcally,it uses the CAMeLBERT model fnetuned for the Multi-Domain Arabic Resources for Sentiment Analysis(MARSA)dataset containing more than 56,000 manually annotated tweets annotated across political,social,sports,and technology domains.Te proposed method avoids extensive use of preprocessing and shows that raw data provides better results because they tend to retain more linguistic features.Te fne-tuned CAMeLBERT model produces state-of-the-art accuracy of 92%,precision of 91.7%,recall of 92.3%,and F1-score of 91.5%,outperforming standard machine learning models and ensemble-based/deep learning techniques.Our performance comparisons against other pre-trained models,namely AraBERTv02-twitter and MARBERT,show that transformer-based architectures are consistently the best suited when dealing with noisy Arabic texts.Tis work leads to a strong remedy for the problems in Arabic sentiment analysis and provides recommendations on easy tuning of the pre-trained models to adapt to challenging linguistic features and domain-specifc tasks.
文摘This review presents a comprehensive and forward-looking analysis of how Large Language Models(LLMs)are transforming knowledge discovery in the rational design of advancedmicro/nano electrocatalyst materials.Electrocatalysis is central to sustainable energy and environmental technologies,but traditional catalyst discovery is often hindered by high complexity,fragmented knowledge,and inefficiencies.LLMs,particularly those based on Transformer architectures,offer unprecedented capabilities in extracting,synthesizing,and generating scientific knowledge from vast unstructured textual corpora.This work provides the first structured synthesis of how LLMs have been leveraged across various electrocatalysis tasks,including automated information extraction from literature,text-based property prediction,hypothesis generation,synthesis planning,and knowledge graph construction.We comparatively analyze leading LLMs and domain-specific frameworks(e.g.,CatBERTa,CataLM,CatGPT)in terms of methodology,application scope,performance metrics,and limitations.Through curated case studies across key electrocatalytic reactions—HER,OER,ORR,and CO_(2)RR—we highlight emerging trends such as the growing use of embedding-based prediction,retrieval-augmented generation,and fine-tuned scientific LLMs.The review also identifies persistent challenges,including data heterogeneity,hallucination risks,lack of standard benchmarks,and limited multimodal integration.Importantly,we articulate future research directions,such as the development of multimodal and physics-informedMatSci-LLMs,enhanced interpretability tools,and the integration of LLMswith selfdriving laboratories for autonomous discovery.By consolidating fragmented advances and outlining a unified research roadmap,this review provides valuable guidance for both materials scientists and AI practitioners seeking to accelerate catalyst innovation through large language model technologies.
基金financial support from the National Science Foundation(NSF)EPSCoR R.I.I.Track-2 Program,awarded under the NSF grant number 2119691.
文摘The increasing frequency and severity of natural disasters,exacerbated by global warming,necessitate novel solutions to strengthen the resilience of Critical Infrastructure Systems(CISs).Recent research reveals the sig-nificant potential of natural language processing(NLP)to analyze unstructured human language during disasters,thereby facilitating the uncovering of disruptions and providing situational awareness supporting various aspects of resilience regarding CISs.Despite this potential,few studies have systematically mapped the global research on NLP applications with respect to supporting various aspects of resilience of CISs.This paper contributes to the body of knowledge by presenting a review of current knowledge using the scientometric review technique.Using 231 bibliographic records from the Scopus and Web of Science core collections,we identify five key research areas where researchers have used NLP to support the resilience of CISs during natural disasters,including sentiment analysis,crisis informatics,data and knowledge visualization,disaster impacts,and content analysis.Furthermore,we map the utility of NLP in the identified research focus with respect to four aspects of resilience(i.e.,preparedness,absorption,recovery,and adaptability)and present various common techniques used and potential future research directions.This review highlights that NLP has the potential to become a supplementary data source to support the resilience of CISs.The results of this study serve as an introductory-level guide designed to help scholars and practitioners unlock the potential of NLP for strengthening the resilience of CISs against natural disasters.
基金financially supported by the First-Class Curriculum Program at the School of Economics and Management,University of the Chinese Academy of Sciencesthe National Natural Science Foundation of China(42041005)the National Social Science Foundation of China(23BTQ054)。
文摘Grassland degradation presents overwhelming challenges to biodiversity,ecosystem services,and the socioeconomic sustainability of dependent communities.However,a comprehensive synthesis of global knowledge on the frontiers and key areas of grassland degradation research has not been achieved due to the limitations of traditional scientometrics methods.The present synthesis of information employed BERTopic,an advanced natural language processing tool,to analyze the extensive ecological literature on grassland degradation.We compiled a dataset of 4,504 publications from the Web of Science core collection database and used it to evaluate the geographic distribution and temporal evolution of different grassland types and available knowledge on the subject.Our analysis identified key topics in the global grassland degradation research domain,including the effects of grassland degradation on ecosystem functions,grassland ecological restoration and biodiversity conservation,erosion processes and hydrological models in grasslands,and others.The BERTopic analysis significantly outperforms traditional methods in identifying complex and evolving topics in large datasets of literature.Compared to traditional scientometrics analysis,BERTopic provides a more comprehensive perspective on the research areas,revealing not only popular topics but also emerging research areas that traditional methods may overlook,although scientometrics offers more specificity and detail.Therefore,we argue for the simultaneous use of both approaches to achieve more systematic and comprehensive assessments of specific research areas.This study represents an emerging application of BERTopic algorithms in ecological research,particularly in the critical research focused on global grassland degradation.It also highlights the need for integrating advanced computational methods in ecological research in this era of data explosion.Tools like the BERTopic algorithm are essential for enhancing our understanding of complex environmental problems,and it marks an important stride towards more sophisticated,data-driven analysis in ecology.
文摘The increasing fluency of advanced language models,such as GPT-3.5,GPT-4,and the recently introduced DeepSeek,challenges the ability to distinguish between human-authored and AI-generated academic writing.This situation is raising significant concerns regarding the integrity and authenticity of academic work.In light of the above,the current research evaluates the effectiveness of Bidirectional Long Short-TermMemory(BiLSTM)networks enhanced with pre-trained GloVe(Global Vectors for Word Representation)embeddings to detect AIgenerated scientific Abstracts drawn from the AI-GA(Artificial Intelligence Generated Abstracts)dataset.Two core BiLSTM variants were assessed:a single-layer approach and a dual-layer design,each tested under static or adaptive embeddings.The single-layer model achieved nearly 97%accuracy with trainable GloVe,occasionally surpassing the deeper model.Despite these gains,neither configuration fully matched the 98.7%benchmark set by an earlier LSTMWord2Vec pipeline.Some runs were over-fitted when embeddings were fine-tuned,whereas static embeddings offered a slightly lower yet stable accuracy of around 96%.This lingering gap reinforces a key ethical and procedural concern:relying solely on automated tools,such as Turnitin’s AI-detection features,to penalize individuals’risks and unjust outcomes.Misclassifications,whether legitimate work is misread as AI-generated or engineered text,evade detection,demonstrating that these classifiers should not stand as the sole arbiters of authenticity.Amore comprehensive approach is warranted,one which weaves model outputs into a systematic process supported by expert judgment and institutional guidelines designed to protect originality.
文摘As a result of breakthroughs in computational approaches mixed with a boom in multi-omics data,the development of numerous digital medicines and bioinformatics tools have aided in speeding up the healthcare industry process.The traditional healthcare development method has been further rationalized with the introduction of artificial intelligence(AI),deep learning(DL),and machine learning(ML).Wide-ranging biological and clinical data in the form of big data,which is stored in various databases worldwide,serve as the raw material for AI-based methods and aid in the precise identification of patterns and models.These patterns and models can be used to identify novel therapeutically active molecules with significantly less time,financial investment,and workforce.This review article provides insights into understanding the principles of AI technologies such as next-generation sequencing(NGS),natural language processing(NLP),radiological images,patients-electronic medical records(EMR),and drug discovery as well as how they should be used in ethical,economic,and social ramifications of AI.This review also highlights various applications of AI in the healthcare industry,along with the analyses of different AI technologies.Additionally,it will offer helpful suggestions to assist decision-makers in creating an AI plan that would support their shift to a digital healthcare system.
文摘Urdu,a prominent subcontinental language,serves as a versatile means of communication.However,its handwritten expressions present challenges for optical character recognition(OCR).While various OCR techniques have been proposed,most of them focus on recognizing printed Urdu characters and digits.To the best of our knowledge,very little research has focused solely on Urdu pure handwriting recognition,and the results of such proposed methods are often inadequate.In this study,we introduce a novel approach to recognizing Urdu pure handwritten digits and characters using Convolutional Neural Networks(CNN).Our proposed method utilizes convolutional layers to extract important features from input images and classifies them using fully connected layers,enabling efficient and accurate detection of Urdu handwritten digits and characters.We implemented the proposed technique on a large publicly available dataset of Urdu handwritten digits and characters.The findings demonstrate that the CNN model achieves an accuracy of 98.30%and an F1 score of 88.6%,indicating its effectiveness in detecting and classifyingUrdu handwritten digits and characters.These results have far-reaching implications for various applications,including document analysis,text recognition,and language understanding,which have previously been unexplored in the context of Urdu handwriting data.This work lays a solid foundation for future research and development in Urdu language detection and processing,opening up new opportunities for advancement in this field.
基金supported by the National Natural Science Foundation of China(Nos.U22A2034,62177047)High Caliber Foreign Experts Introduction Plan funded by MOST,and Central South University Research Programme of Advanced Interdisciplinary Studies(No.2023QYJC020).
文摘Image captioning has seen significant research efforts over the last decade.The goal is to generate meaningful semantic sentences that describe visual content depicted in photographs and are syntactically accurate.Many real-world applications rely on image captioning,such as helping people with visual impairments to see their surroundings.To formulate a coherent and relevant textual description,computer vision techniques are utilized to comprehend the visual content within an image,followed by natural language processing methods.Numerous approaches and models have been developed to deal with this multifaceted problem.Several models prove to be stateof-the-art solutions in this field.This work offers an exclusive perspective emphasizing the most critical strategies and techniques for enhancing image caption generation.Rather than reviewing all previous image captioning work,we analyze various techniques that significantly improve image caption generation and achieve significant performance improvements,including encompassing image captioning with visual attention methods,exploring semantic information types in captions,and employing multi-caption generation techniques.Further,advancements such as neural architecture search,few-shot learning,multi-phase learning,and cross-modal embedding within image caption networks are examined for their transformative effects.The comprehensive quantitative analysis conducted in this study identifies cutting-edgemethodologies and sheds light on their profound impact,driving forward the forefront of image captioning technology.
基金Tianjin Education Commission Research Program,Humanities and Social Sciences,(Project No.:2022SK064)Innovation Training Program at Tianjin Normal University in 2024,“Research on the Function of Rural Norms in Source Governance of Disputes from the Perspective of Rural Revitalization”(Project No.:202410065027)。
文摘This study examines the role of village regulations within China’s Litigation Source Governance(LSG)framework,specifically analyzing Tianjin Municipality’s 2023 Model Village Regulations.Employing legal analysis and Natural Language Processing(NLP)techniques,the research evaluates the effectiveness,enforceability,and thematic orientation of these regulations in grassroots dispute resolution.Findings reveal a pronounced reliance on moral governance provisions,limited judicial recognition,and significant implementation challenges due to the predominance of non-binding(soft)clauses.The study recommends enhancing judicial recognition through formal confirmation mechanisms,increasing legally binding clauses,and integrating village-level governance more closely with formal judicial processes.This approach not only strengthens local governance but also provides valuable insights for nationwide replication,supporting broader goals of rural stability and governance modernization.
文摘X(formerly known as Twitter)is one of the most prominent social media platforms,enabling users to share short messages(tweets)with the public or their followers.It serves various purposes,from real-time news dissemination and political discourse to trend spotting and consumer engagement.X has emerged as a key space for understanding shifting brand perceptions,consumer preferences,and product-related sentiment in the fashion industry.However,the platform’s informal,dynamic,and context-dependent language poses substantial challenges for sentiment analysis,mainly when attempting to detect sarcasm,slang,and nuanced emotional tones.This study introduces a hybrid deep learning framework that integrates Transformer encoders,recurrent neural networks(i.e.,Long Short-Term Memory(LSTM)and Gated Recurrent Unit(GRU)),and attention mechanisms to improve the accuracy of fashion-related sentiment classification.These methods were selected due to their proven strength in capturing both contextual dependencies and sequential structures,which are essential for interpreting short-form text.Our model was evaluated on a dataset of 20,000 fashion tweets.The experimental results demonstrate a classification accuracy of 92.25%,outperforming conventional models such as Logistic Regression,Linear Support Vector Machine(SVM),and even standalone LSTM by a margin of up to 8%.This improvement highlights the importance of hybrid architectures in handling noisy,informal social media data.This study’s findings offer strong implications for digital marketing and brand management,where timely sentiment detection is critical.Despite the promising results,challenges remain regarding the precise identification of negative sentiments,indicating that further work is needed to detect subtle and contextually embedded expressions.
基金supported by the IITP(Institute of Information&Communications Technology Planning&Evaluation)-ITRC(Information Technology Research Center)grant funded by the Korean government(Ministry of Science and ICT)(IITP-2025-RS-2024-00438056).
文摘The increased accessibility of social networking services(SNSs)has facilitated communication and information sharing among users.However,it has also heightened concerns about digital safety,particularly for children and adolescents who are increasingly exposed to online grooming crimes.Early and accurate identification of grooming conversations is crucial in preventing long-term harm to victims.However,research on grooming detection in South Korea remains limited,as existing models trained primarily on English text and fail to reflect the unique linguistic features of SNS conversations,leading to inaccurate classifications.To address these issues,this study proposes a novel framework that integrates optical character recognition(OCR)technology with KcELECTRA,a deep learning-based natural language processing(NLP)model that shows excellent performance in processing the colloquial Korean language.In the proposed framework,the KcELECTRA model is fine-tuned by an extensive dataset,including Korean social media conversations,Korean ethical verification data from AI-Hub,and Korean hate speech data from Hug-gingFace,to enable more accurate classification of text extracted from social media conversation images.Experimental results show that the proposed framework achieves an accuracy of 0.953,outperforming existing transformer-based models.Furthermore,OCR technology shows high accuracy in extracting text from images,demonstrating that the proposed framework is effective for online grooming detection.The proposed framework is expected to contribute to the more accurate detection of grooming text and the prevention of grooming-related crimes.
基金supported by the National Natural Science Foundation of China(Grant Nos.U23A2069,12372288,12388101,and 92152301)Jilin Province Science and Technology Development Program,China(Grant No.20220301013GX)Aeronautical Science Foundation of China(Grant No.2020Z006058002)。
文摘Recent advances in contrastive language-image pretraining(CLIP)models and generative AI have demonstrated significant capabilities in cross-modal understanding and content generation.Based on these developments,this study introduces a novel framework for airfoil design via natural language interfaces.To the authors’knowledge,this study establishes the first end-to-end,bidirectional mapping between textual descriptions(e.g.,“low-drag supercritical wing for transonic conditions”)and parametric airfoil geometries represented by class-shape transformation parameters.The proposed approach integrates a CLIP-inspired architecture that aligns text embeddings with airfoil parameter spaces through contrastive learning,along with a semantically conditioned decoder that produces physically plausible airfoil geometries from latent representations.The experimental results validate the framework’s ability to generate aerodynamically plausible airfoils from natural language specifications and to classify airfoils accurately based on given textual labels.This research reduces the expertise threshold for preliminary airfoil design and highlights the potential for human-AI collaboration in aerospace engineering.
基金funded by the Natural Science Foundation of Fujian Province,China,grant No.2022J05291.
文摘Topic modeling is a fundamental technique of content analysis in natural language processing,widely applied in domains such as social sciences and finance.In the era of digital communication,social scientists increasingly rely on large-scale social media data to explore public discourse,collective behavior,and emerging social concerns.However,traditional models like Latent Dirichlet Allocation(LDA)and neural topic models like BERTopic struggle to capture deep semantic structures in short-text datasets,especially in complex non-English languages like Chinese.This paper presents Generative Language Model Topic(GLMTopic)a novel hybrid topic modeling framework leveraging the capabilities of large language models,designed to support social science research by uncovering coherent and interpretable themes from Chinese social media platforms.GLMTopic integrates Adaptive Community-enhanced Graph Embedding for advanced semantic representation,Uniform Manifold Approximation and Projection-based(UMAP-based)dimensionality reduction,Hierarchical Density-Based Spatial Clustering of Applications with Noise(HDBSCAN)clustering,and large language model-powered(LLM-powered)representation tuning to generate more contextually relevant and interpretable topics.By reducing dependence on extensive text preprocessing and human expert intervention in post-analysis topic label annotation,GLMTopic facilitates a fully automated and user-friendly topic extraction process.Experimental evaluations on a social media dataset sourced from Weibo demonstrate that GLMTopic outperforms Latent Dirichlet Allocation(LDA)and BERTopic in coherence score and usability with automated interpretation,providing a more scalable and semantically accurate solution for Chinese topic modeling.Future research will explore optimizing computational efficiency,integrating knowledge graphs and sentiment analysis for more complicated workflows,and extending the framework for real-time and multilingual topic modeling.