Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the ...Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.展开更多
Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for...Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for us to explore new me-thods and approaches concerning the prevention and treatment of mental illne-sses.Large multimodal models(LMMs),as the most advanced artificial intelligen-ce models(i.e.ChatGPT-4),have brought new hope to the accurate prevention,diagnosis,and treatment of psychiatric disorders.The assistance of these models in the promotion of mental health is critical,as the latter necessitates a strong foundation of medical knowledge and professional skills,emotional support,stigma mitigation,the encouragement of more honest patient self-disclosure,reduced health care costs,improved medical efficiency,and greater mental health service coverage.However,these models must address challenges related to health,safety,hallucinations,and ethics simultaneously.In the future,we should address these challenges by developing relevant usage manuals,accountability rules,and legal regulations;implementing a human-centered approach;and intelligently upgrading LMMs through the deep optimization of such models,their algorithms,and other means.This effort will thus substantially contribute not only to the maintenance of students’health but also to the achievement of global sustainable development goals.展开更多
Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensiv...Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensive training datasets.The public release and rapid refinement of large language models(LLMs)like ChatGPT have accelerated the adoption of GAI across various medical specialties,offering new tools for education,clinical simulation,and research.Dermatology training,which heavily relies on visual pattern recognition and requires extensive exposure to diverse morphological presentations,faces persistent challenges such as uneven distribu-tion of educational resources,limited patient exposure for rare conditions,and variability in teaching quality.Exploring the integration of GAI into pedagogical frameworks offers innovative approaches to address these challenges,potentially enhancing the quality,standardization,scalability,and accessibility of dermatology ed-ucation.This comprehensive review examines the core concepts and technical foundations of GAI,highlights its specific applications within dermatology teaching and learning—including simulated case generation,per-sonalized learning pathways,and academic support—and discusses the current limitations,practical challenges,and ethical considerations surrounding its use.The aim is to provide a balanced perspective on the significant potential of GAI for transforming dermatology education and to offer evidence-based insights to guide future exploration,implementation,and policy development.展开更多
Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing...Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities.展开更多
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si...The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.展开更多
Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical impo...Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical importance of this task, conventional vision-based foreign object detection methodologies have predominantly concentrated on image data, neglecting the exploration and integration of textual information. The currently popular multimodal model Contrastive Language-Image Pre-training (CLIP) employs contrastive learning to enable simultaneous understanding of both visual and textual modalities. Drawing inspiration from CLIP’s capabilities, this paper introduces a novel CLIP-based multimodal foreign object detection model tailored for railway applications, referred to as Railway-CLIP. This model leverages CLIP’s robust generalization capabilities to enhance performance in the context of catenary foreign object detection. The Railway-CLIP model is primarily composed of an image encoder and a text encoder. Initially, the Segment Anything Model (SAM) is employed to preprocess raw images, identifying candidate bounding boxes that may contain foreign objects. Both the original images and the detected candidate bounding boxes are subsequently fed into the image encoder to extract their respective visual features. In parallel, distinct prompt templates are crafted for both the original images and the candidate bounding boxes to serve as textual inputs. These prompts are then processed by the text encoder to derive textual features. The image and text encoders collaboratively project the multimodal features into a shared semantic space, facilitating the computation of similarity scores between visual and textual representations. The final detection results are determined based on these similarity scores, ensuring a robust and accurate identification of anomalous objects. Extensive experiments on our collected Railway Anomaly Dataset (RAD) demonstrate that the proposed Railway-CLIP outperforms previous state-of-the-art methods, achieving 97.25% AUROC and 92.66% F1-score, thereby validating the effectiveness and superiority of the proposed approach in real-world high-speed railway anomaly detection scenarios.展开更多
Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on se...Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on separate image and text modalities,which may not fully utilize the information available from both sources.To address this limitation,we propose a novel multimodal large model,i.e.,the PKME-MLM(Prior Knowledge and Multi-label Emotion analysis based Multimodal Large Model for sarcasm detection).The PKME-MLM aims to enhance sarcasm detection by integrating prior knowledge to extract useful textual information from images,which is then combined with text data for deeper analysis.This method improves the integration of image and text data,addressing the limitation of previous models that process these modalities separately.Additionally,we incorporate multi-label sentiment analysis,refining sentiment labels to improve sarcasm recognition accuracy.This design overcomes the limitations of prior models that treated sentiment classification as a single-label problem,thereby improving sarcasm recognition by distinguishing subtle emotional cues from the text.Experimental results demonstrate that our approach achieves significant performance improvements in multimodal sarcasm detection tasks,with an accuracy(Acc.)of 94.35%,and Macro-Average Precision and Recall reaching 93.92%and 94.21%,respectively.These results highlight the potential of multimodal models in improving sarcasm detection and suggest that further integration of modalities could advance future research.This work also paves the way for incorporating multimodal sentiment analysis into sarcasm detection.展开更多
Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelli...Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies.展开更多
BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and ...BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and imaging findings.Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information,resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.AIM To develop and evaluate a collaborative multimodal large language model(LLM)framework for clinical decision-making in digestive diseases.METHODS In this observational study,DeepGut,a multimodal LLM collaborative diagnostic framework,was developed to integrate four distinct large models into a four-tiered structure.The framework sequentially accomplishes multimodal infor-mation extraction,logical“chain”construction,diagnostic and treatment suggestion generation,and risk analysis.The model was evaluated using objective metrics,which assess the reliability and comprehensiveness of model-generated results,and subjective expert opinions,which examine the effectiveness of the framework in assisting physicians.RESULTS The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance,with a diagnostic accuracy of 97.8%,diagnostic completeness of 93.9%,treatment plan accuracy of 95.2%,and treatment plan completeness of 98.0%,significantly surpassing the capabilities of single-modal LLM-based diagnostic tools.Experts evaluating the framework commended the completeness,relevance,and logical coherence of its outputs.However,the collaborative multimodal LLM approach resulted in increased input and output token counts,leading to higher computational costs and extended diagnostic times.CONCLUSION The framework achieves successful integration of multimodal diagnostic data,demonstrating enhanced performance enabled by multimodal LLM collaboration,which opens new horizons for the clinical application of artificial intelligence-assisted technology.展开更多
Large language models(LLMs)have emerged as transformative tools in radiology artificial intelligence(AI),offering significant capabilities in areas such as image report generation,clinical decision support,and workflo...Large language models(LLMs)have emerged as transformative tools in radiology artificial intelligence(AI),offering significant capabilities in areas such as image report generation,clinical decision support,and workflow optimization.The first part of this manuscript presents a comprehensive overview of the current state of LLM applications in radiology,including their historical evolution,technical foundations,and practical uses.Despite notable advances,inherent architectural constraints,such as token-level sequential processing,limit their ability to perform deep abstract reasoning and holistic contextual understanding,which are critical for fine-grained diagnostic interpretation.We provide a critical perspective on current LLMs and discuss key challenges,including model reliability,bias,and explainability,highlighting the pressing need for novel approaches to advance radiology AI.Large concept models(LCMs)represent a nascent and promising paradigm in radiology AI,designed to transcend the limitations of token-level processing by utilizing higher-order conceptual representations and multimodal data integration.The second part of this manuscript introduces the foundational principles and theoretical framework of LCMs,highlighting their potential to facilitate enhanced semantic reasoning,long-range context synthesis,and improved clinical decision-making.Critically,the core of this section is the proposal of a novel theoretical framework for LCMs,formalized and extended from our group’s foundational concept-based models-the world’s earliest articulation of this paradigm for medical AI.This conceptual shift has since been externally validated and propelled by the recent publication of the LCM architectural proposal by Meta AI,providing a large-scale engineering blueprint for the future development of this technology.We also outline future research directions and the transformative implications of this emerging AI paradigm for radiologic practice,aiming to provide a blueprint for advancing toward human-like conceptual understanding in AI.While challenges persist,we are at the very beginning of a new era,and it is not unreasonable to hope that future advancements will overcome these hurdles,pushing the boundaries of AI in Radiology,far beyond even the most state-of-the-art models of today.展开更多
User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore ...User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL.展开更多
Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural ...Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural artifacts online.As an effective medium,posters serve to attract public attention and facilitate broader engagement with cultural artifacts.However,existing poster generation methods mainly rely on fixed templates and manual design,which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts.Therefore,we propose CAPGen,an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language Model(MLLM)with integrated iterative optimization.During our research,we collaborated with designers to define principles of graphic design for cultural artifact posters,to guide the MLLM in generating layout parameters.Later,we generated these parameters into posters.Finally,we refined the posters using an MLLM integrated with a multi-round iterative optimization mechanism.Qualitative results show that CAPGen consistently outperforms baseline methods in both visual quality and aesthetic performance.Furthermore,ablation studies indicate that the prompt,iterative optimization mechanism,and design principles significantly enhance the effectiveness of poster generation.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current i...The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current intelligent AM(IAM)systems face limitations such as fragmented AI tool usage and suboptimal human-machine interaction.This paper reviews existing IAM solutions,emphasizing control,monitoring,process autonomy,and end-to-end integration,and identifies key limitations,such as the absence of a high-level controller for global decision-making.To address these gaps,we propose a transition from IAM to autonomous AM,featuring a hierarchical framework with four integrated layers:knowledge,generative solution,operational,and cognitive.In the cognitive layer,AI agents notably enable machines to independently observe,analyze,plan,and execute operations that traditionally require human intervention.These capabilities streamline production processes and expand the possibilities for innovation,particularly in sectors like in-space manufacturing.Additionally,this paper discusses the role of AI in self-optimization and lifelong learning,positing that the future of AM will be characterized by a symbiotic relationship between human expertise and advanced autonomy,fostering a more adaptive,resilient manufacturing ecosystem.展开更多
The rapid advancement of large models has led to the development of increasingly sophisticated models capable of generating diverse,personalized,and high-quality content.Among these,DeepSeek has emerged as a pivotal o...The rapid advancement of large models has led to the development of increasingly sophisticated models capable of generating diverse,personalized,and high-quality content.Among these,DeepSeek has emerged as a pivotal open-source initiative,demonstrating high performance at significantly lower computation costs compared to closed-source counterparts.This survey provides a comprehensive overview of the DeepSeek family of models,including DeepSeek-V3 and DeepSeek-R1,covering their core innovations in architecture,system pipeline,algorithm,and infrastructure.We explore their practical applications across various domains,such as healthcare,finance,and education,highlighting their impact on both industry and society.Further-more,we examine potential security,privacy,and ethical concerns arising from the widespread deployment of these models,emphasizing the need for responsible AI development.Finally,we outline future research directions to enhance the performance,safety,and scalability of DeepSeek models,aiming to foster further advancements in the open-source large model community.展开更多
Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in e...Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.展开更多
The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and ge...The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and genomics to enhance clinical decision-making.This systematic review explores the core methodologies and applied research frontiers of medical MLLMs,focusing on their architecture,training methods,evaluation techniques,and applications.We highlight the transformative potential of MLLMs in achieving cross-modal semantic alignment,medical knowledge integration,and robust clinical reasoning.Despite their promise,challenges such as data heterogeneity,hallucination,and computational efficiency persist.By reviewing state-of-the-art solutions and future directions,this paper provides a comprehensive technical guide for developing reliable and interpretable medical MLLMs,ultimately aiming to bridge the gap between AI and clinical practice.展开更多
This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comp...This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comprehensive researches on the theoretical foundation and technical system of federated services,aiming at advancing our understanding and implementation of this novel service paradigm.First,a thorough examination of the characteristics of federated security within federated services is conducted.Then,a five-layer technical framework is formulated under a decentralized intelligent architecture,ensuring secure,agile,and adaptable service provision.On this basis,the operational mechanisms underlying data federation and service confederation is analyzed,with emphasis on the smart supply-demand matching model.Furthermore,a scenario-oriented taxonomy of federated services accompanied by illustrative examples is proposed.Our work offers actionable insights and roadmap for realizing and advancing federated services,contributing to the refinement and wider adoption of this transformative service paradigm in the digital era.展开更多
The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph rep...The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph repre-sentation learning combined with classical machine learning techniques to model and interpret the structural evolution of LLM-related survey papers.By constructing attributed graphs that capture topic distributions and intercon-nections,we provide a data-driven framework to explore research trends in this domain.A dataset of 241 survey papers published between July 2021 and January 2024 is analyzed to identify thematic developments and interdiscipli-nary relationships.The results highlight key areas of specialization,including the emergence of prompting science,multimodal models,and domain-spe-cific applications in finance,education,and law.Co-occurrence analysis of survey topics reveals strong interconnections between core LLM research and fields such as software engineering,hardware architecture,and evaluation methodologies.These findings demonstrate the increasing specialization of LLM research and its growing integration across multiple disciplines.By lev-eraging graph-based methodologies,this study offers a structured approach to understanding the LLM survey landscape,facilitating efficient navigation of existing literature and identification of emerging research directions.The in-sights presented contribute to a more comprehensive understanding of the field’s trajectory,assisting researchers and practitioners in engaging with the latest developments in LLM research.展开更多
For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the ch...For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the channel capacity formula derivation was obtained.On the optimal criterion of the channel capacity,the power allocation methods of both amplifying and forwarding(AF) and decoding and forwarding(DF) cooperative communication systems were proposed in the limitation of the total power to maximize the channel capacity.The mode selection methods of single input single output(SISO) and single input multiple output(SIMO) models in the rectangular tunnel,through which the higher channel capacity can be obtained,were put forward as well.The theoretical analysis and simulation comparison show that,channel capacity of the wireless communication system in the rectangular tunnel can be effectively enhanced through the cooperative technology;channel capacity of the rectangular tunnel under complicated conditions is maximized through the proposed power allocation methods,and the optimal cooperative mode of the channel capacity can be chosen according to the cooperative mode selection methods given in the paper.展开更多
基金We acknowledge funding from NSFC Grant 62306283.
文摘Since the 1950s,when the Turing Test was introduced,there has been notable progress in machine language intelligence.Language modeling,crucial for AI development,has evolved from statistical to neural models over the last two decades.Recently,transformer-based Pre-trained Language Models(PLM)have excelled in Natural Language Processing(NLP)tasks by leveraging large-scale training corpora.Increasing the scale of these models enhances performance significantly,introducing abilities like context learning that smaller models lack.The advancement in Large Language Models,exemplified by the development of ChatGPT,has made significant impacts both academically and industrially,capturing widespread societal interest.This survey provides an overview of the development and prospects from Large Language Models(LLM)to Large Multimodal Models(LMM).It first discusses the contributions and technological advancements of LLMs in the field of natural language processing,especially in text generation and language understanding.Then,it turns to the discussion of LMMs,which integrates various data modalities such as text,images,and sound,demonstrating advanced capabilities in understanding and generating cross-modal content,paving new pathways for the adaptability and flexibility of AI systems.Finally,the survey highlights the prospects of LMMs in terms of technological development and application potential,while also pointing out challenges in data integration,cross-modal understanding accuracy,providing a comprehensive perspective on the latest developments in this field.
文摘Students are considered one of the groups most affected by psychological pro-blems.Given the highly dangerous nature of mental illnesses and the increasing-ly serious state of global mental health,it is imperative for us to explore new me-thods and approaches concerning the prevention and treatment of mental illne-sses.Large multimodal models(LMMs),as the most advanced artificial intelligen-ce models(i.e.ChatGPT-4),have brought new hope to the accurate prevention,diagnosis,and treatment of psychiatric disorders.The assistance of these models in the promotion of mental health is critical,as the latter necessitates a strong foundation of medical knowledge and professional skills,emotional support,stigma mitigation,the encouragement of more honest patient self-disclosure,reduced health care costs,improved medical efficiency,and greater mental health service coverage.However,these models must address challenges related to health,safety,hallucinations,and ethics simultaneously.In the future,we should address these challenges by developing relevant usage manuals,accountability rules,and legal regulations;implementing a human-centered approach;and intelligently upgrading LMMs through the deep optimization of such models,their algorithms,and other means.This effort will thus substantially contribute not only to the maintenance of students’health but also to the achievement of global sustainable development goals.
文摘Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensive training datasets.The public release and rapid refinement of large language models(LLMs)like ChatGPT have accelerated the adoption of GAI across various medical specialties,offering new tools for education,clinical simulation,and research.Dermatology training,which heavily relies on visual pattern recognition and requires extensive exposure to diverse morphological presentations,faces persistent challenges such as uneven distribu-tion of educational resources,limited patient exposure for rare conditions,and variability in teaching quality.Exploring the integration of GAI into pedagogical frameworks offers innovative approaches to address these challenges,potentially enhancing the quality,standardization,scalability,and accessibility of dermatology ed-ucation.This comprehensive review examines the core concepts and technical foundations of GAI,highlights its specific applications within dermatology teaching and learning—including simulated case generation,per-sonalized learning pathways,and academic support—and discusses the current limitations,practical challenges,and ethical considerations surrounding its use.The aim is to provide a balanced perspective on the significant potential of GAI for transforming dermatology education and to offer evidence-based insights to guide future exploration,implementation,and policy development.
基金supported by a grant from the Ministry of Research,Innovation and Digitization,CNCS/CCCDI-UEFISCDI,project number COFUND-CETP-SMART-LEM-1,within PNCDI Ⅳ.
文摘Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities.
基金The Natural Science Foundation of Hebei Province(F2024501044).
文摘The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.
基金supported by the Technology Research and Development Program of China National Railway Group(Q2024T002)the Open Project Fund of National Engineering Research Center of Digital Construction and Evaluation Technology of Urban Rail Transit(2024023).
文摘Automated detection of suspended anomalous objects on high-speed railway catenary systems using computer vision-based technology is a critical task for ensuring railway transportation safety. Despite the critical importance of this task, conventional vision-based foreign object detection methodologies have predominantly concentrated on image data, neglecting the exploration and integration of textual information. The currently popular multimodal model Contrastive Language-Image Pre-training (CLIP) employs contrastive learning to enable simultaneous understanding of both visual and textual modalities. Drawing inspiration from CLIP’s capabilities, this paper introduces a novel CLIP-based multimodal foreign object detection model tailored for railway applications, referred to as Railway-CLIP. This model leverages CLIP’s robust generalization capabilities to enhance performance in the context of catenary foreign object detection. The Railway-CLIP model is primarily composed of an image encoder and a text encoder. Initially, the Segment Anything Model (SAM) is employed to preprocess raw images, identifying candidate bounding boxes that may contain foreign objects. Both the original images and the detected candidate bounding boxes are subsequently fed into the image encoder to extract their respective visual features. In parallel, distinct prompt templates are crafted for both the original images and the candidate bounding boxes to serve as textual inputs. These prompts are then processed by the text encoder to derive textual features. The image and text encoders collaboratively project the multimodal features into a shared semantic space, facilitating the computation of similarity scores between visual and textual representations. The final detection results are determined based on these similarity scores, ensuring a robust and accurate identification of anomalous objects. Extensive experiments on our collected Railway Anomaly Dataset (RAD) demonstrate that the proposed Railway-CLIP outperforms previous state-of-the-art methods, achieving 97.25% AUROC and 92.66% F1-score, thereby validating the effectiveness and superiority of the proposed approach in real-world high-speed railway anomaly detection scenarios.
基金funding partly by the National Natural Science Foundation of China under grant number 61701179.
文摘Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on separate image and text modalities,which may not fully utilize the information available from both sources.To address this limitation,we propose a novel multimodal large model,i.e.,the PKME-MLM(Prior Knowledge and Multi-label Emotion analysis based Multimodal Large Model for sarcasm detection).The PKME-MLM aims to enhance sarcasm detection by integrating prior knowledge to extract useful textual information from images,which is then combined with text data for deeper analysis.This method improves the integration of image and text data,addressing the limitation of previous models that process these modalities separately.Additionally,we incorporate multi-label sentiment analysis,refining sentiment labels to improve sarcasm recognition accuracy.This design overcomes the limitations of prior models that treated sentiment classification as a single-label problem,thereby improving sarcasm recognition by distinguishing subtle emotional cues from the text.Experimental results demonstrate that our approach achieves significant performance improvements in multimodal sarcasm detection tasks,with an accuracy(Acc.)of 94.35%,and Macro-Average Precision and Recall reaching 93.92%and 94.21%,respectively.These results highlight the potential of multimodal models in improving sarcasm detection and suggest that further integration of modalities could advance future research.This work also paves the way for incorporating multimodal sentiment analysis into sarcasm detection.
基金Supported by the Open Project Program of Panxi Crops Research and Utilization Key Laboratory of Sichuan Province,No.SZKF202302the Fundamental Research Funds for the Central Universities No.2019CDYGYB024.
文摘Gastrointestinal(GI)cancers represent a major global health concern due to their high incidence and mortality rates.Foundation models(FMs),also referred to as large models,represent a novel class of artificial intelligence technologies that have demonstrated considerable potential in addressing these challenges.These models encompass large language models(LLMs),vision FMs(VFMs),and multimodal LLMs(MLLMs),all of which utilize transformer architectures and self-supervised pre-training on extensive unlabeled datasets to achieve robust cross-domain generalization.This review delineates the principal applications of these models:LLMs facilitate the structuring of clinical narratives,extraction of insights from medical records,and enhancement of physician-patient communication;VFMs are employed in the analysis of endoscopic,radiological,and pathological images for lesion detection and staging;MLLMs integrate heterogeneous data modalities,including imaging,textual information,and genomic data,to support diagnostic processes,treatment prediction,and prognostic evaluation.Despite these promising developments,several challenges remain,such as the need for data standardization,limited diversity within training datasets,substantial computational resource requirements,and ethical-legal concerns.In conclusion,FMs exhibit significant potential to advance research and clinical management of GI cancers.Future research efforts should prioritize the refinement of these models,promote international collaborations,and adopt interdisciplinary approaches.Such a comprehensive strategy is essential to fully harness the capabilities of FMs,driving substantial progress in the fight against GI malignancies.
基金Supported by China Health Promotion Foundation Young Doctors’Research Foundation for Inflammatory Bowel DiseaseTaishan Scholars Program of Shandong Province,China,NO.tsqn202306343National Natural Science Foundation of China,No.82270580,No.82070552,No.82270578,and No.82300599.
文摘BACKGROUND Gastrointestinal diseases have complex etiologies and clinical presentations.An accurate diagnosis requires physicians to integrate diverse information,including medical history,laboratory test results,and imaging findings.Existing artificial intelligence-assisted diagnostic tools are limited to single-modality information,resulting in recommendations that are often incomplete and may be associated with clinical or legal risks.AIM To develop and evaluate a collaborative multimodal large language model(LLM)framework for clinical decision-making in digestive diseases.METHODS In this observational study,DeepGut,a multimodal LLM collaborative diagnostic framework,was developed to integrate four distinct large models into a four-tiered structure.The framework sequentially accomplishes multimodal infor-mation extraction,logical“chain”construction,diagnostic and treatment suggestion generation,and risk analysis.The model was evaluated using objective metrics,which assess the reliability and comprehensiveness of model-generated results,and subjective expert opinions,which examine the effectiveness of the framework in assisting physicians.RESULTS The diagnostic and treatment recommendations generated by the DeepGut framework achieved exceptional performance,with a diagnostic accuracy of 97.8%,diagnostic completeness of 93.9%,treatment plan accuracy of 95.2%,and treatment plan completeness of 98.0%,significantly surpassing the capabilities of single-modal LLM-based diagnostic tools.Experts evaluating the framework commended the completeness,relevance,and logical coherence of its outputs.However,the collaborative multimodal LLM approach resulted in increased input and output token counts,leading to higher computational costs and extended diagnostic times.CONCLUSION The framework achieves successful integration of multimodal diagnostic data,demonstrating enhanced performance enabled by multimodal LLM collaboration,which opens new horizons for the clinical application of artificial intelligence-assisted technology.
文摘Large language models(LLMs)have emerged as transformative tools in radiology artificial intelligence(AI),offering significant capabilities in areas such as image report generation,clinical decision support,and workflow optimization.The first part of this manuscript presents a comprehensive overview of the current state of LLM applications in radiology,including their historical evolution,technical foundations,and practical uses.Despite notable advances,inherent architectural constraints,such as token-level sequential processing,limit their ability to perform deep abstract reasoning and holistic contextual understanding,which are critical for fine-grained diagnostic interpretation.We provide a critical perspective on current LLMs and discuss key challenges,including model reliability,bias,and explainability,highlighting the pressing need for novel approaches to advance radiology AI.Large concept models(LCMs)represent a nascent and promising paradigm in radiology AI,designed to transcend the limitations of token-level processing by utilizing higher-order conceptual representations and multimodal data integration.The second part of this manuscript introduces the foundational principles and theoretical framework of LCMs,highlighting their potential to facilitate enhanced semantic reasoning,long-range context synthesis,and improved clinical decision-making.Critically,the core of this section is the proposal of a novel theoretical framework for LCMs,formalized and extended from our group’s foundational concept-based models-the world’s earliest articulation of this paradigm for medical AI.This conceptual shift has since been externally validated and propelled by the recent publication of the LCM architectural proposal by Meta AI,providing a large-scale engineering blueprint for the future development of this technology.We also outline future research directions and the transformative implications of this emerging AI paradigm for radiologic practice,aiming to provide a blueprint for advancing toward human-like conceptual understanding in AI.While challenges persist,we are at the very beginning of a new era,and it is not unreasonable to hope that future advancements will overcome these hurdles,pushing the boundaries of AI in Radiology,far beyond even the most state-of-the-art models of today.
文摘User identity linkage(UIL)refers to identifying user accounts belonging to the same identity across different social media platforms.Most of the current research is based on text analysis,which fails to fully explore the rich image resources generated by users,and the existing attempts touch on the multimodal domain,but still face the challenge of semantic differences between text and images.Given this,we investigate the UIL task across different social media platforms based on multimodal user-generated contents(UGCs).We innovatively introduce the efficient user identity linkage via aligned multi-modal features and temporal correlation(EUIL)approach.The method first generates captions for user-posted images with the BLIP model,alleviating the problem of missing textual information.Subsequently,we extract aligned text and image features with the CLIP model,which closely aligns the two modalities and significantly reduces the semantic gap.Accordingly,we construct a set of adapter modules to integrate the multimodal features.Furthermore,we design a temporal weight assignment mechanism to incorporate the temporal dimension of user behavior.We evaluate the proposed scheme on the real-world social dataset TWIN,and the results show that our method reaches 86.39%accuracy,which demonstrates the excellence in handling multimodal data,and provides strong algorithmic support for UIL.
基金supported by the National Key Research and Development Program of China(2023YFF0906502)the Postgraduate Research and Innovation Project of Hunan Province under Grant(CX20240473).
文摘Due to the digital transformation tendency among cultural institutions and the substantial influence of the social media platform,the demands of visual communication keep increasing for promoting traditional cultural artifacts online.As an effective medium,posters serve to attract public attention and facilitate broader engagement with cultural artifacts.However,existing poster generation methods mainly rely on fixed templates and manual design,which limits their scalability and adaptability to the diverse visual and semantic features of the artifacts.Therefore,we propose CAPGen,an automated aesthetic Cultural Artifacts Poster Generation framework built on a Multimodal Large Language Model(MLLM)with integrated iterative optimization.During our research,we collaborated with designers to define principles of graphic design for cultural artifact posters,to guide the MLLM in generating layout parameters.Later,we generated these parameters into posters.Finally,we refined the posters using an MLLM integrated with a multi-round iterative optimization mechanism.Qualitative results show that CAPGen consistently outperforms baseline methods in both visual quality and aesthetic performance.Furthermore,ablation studies indicate that the prompt,iterative optimization mechanism,and design principles significantly enhance the effectiveness of poster generation.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
基金funded by the MUREP High Volume project(80NSSC22M0132)through the U.S.NASA Office of STEM Engagementthe SMART IAC Project(DE-EE0009726)through the U.S.Department of Energy Office of Manufacturing and Energy Supply Chainssupport of San Diego Supercomputer Center(SDSC)National Research Platform(NRP)Nautilus sponsored by the U.S.NSF(2100237,2120019)。
文摘The additive manufacturing(AM)landscape has significantly transformed in alignment with Industry 4.0 principles,primarily driven by the integration of artificial intelligence(AI)and digital twins(DT).However,current intelligent AM(IAM)systems face limitations such as fragmented AI tool usage and suboptimal human-machine interaction.This paper reviews existing IAM solutions,emphasizing control,monitoring,process autonomy,and end-to-end integration,and identifies key limitations,such as the absence of a high-level controller for global decision-making.To address these gaps,we propose a transition from IAM to autonomous AM,featuring a hierarchical framework with four integrated layers:knowledge,generative solution,operational,and cognitive.In the cognitive layer,AI agents notably enable machines to independently observe,analyze,plan,and execute operations that traditionally require human intervention.These capabilities streamline production processes and expand the possibilities for innovation,particularly in sectors like in-space manufacturing.Additionally,this paper discusses the role of AI in self-optimization and lifelong learning,positing that the future of AM will be characterized by a symbiotic relationship between human expertise and advanced autonomy,fostering a more adaptive,resilient manufacturing ecosystem.
文摘The rapid advancement of large models has led to the development of increasingly sophisticated models capable of generating diverse,personalized,and high-quality content.Among these,DeepSeek has emerged as a pivotal open-source initiative,demonstrating high performance at significantly lower computation costs compared to closed-source counterparts.This survey provides a comprehensive overview of the DeepSeek family of models,including DeepSeek-V3 and DeepSeek-R1,covering their core innovations in architecture,system pipeline,algorithm,and infrastructure.We explore their practical applications across various domains,such as healthcare,finance,and education,highlighting their impact on both industry and society.Further-more,we examine potential security,privacy,and ethical concerns arising from the widespread deployment of these models,emphasizing the need for responsible AI development.Finally,we outline future research directions to enhance the performance,safety,and scalability of DeepSeek models,aiming to foster further advancements in the open-source large model community.
基金supported by the Guangdong Provincial Science and Technology Program(Grant No.2023A0505030003).
文摘Can current robotic technologies truly replicate the full scope and intricacies of human labour?In practice,the adoption of robots remains limited,especially in open,unstructured environments commonly encountered in everyday scenarios such as services,healthcare,agriculture,construction,and numerous other fields.From the perspective of general robotic manipulation,the challenges arise from three factors.(1)High operational barriers:human operators are obliged to master specialized robotic programming languages and gain a deep understanding of the tasks at hand.These tasks need to be broken down into action-level robotic programs,which results in high labour costs.(2)Limited autonomous task execution:robots lack the capability to independently plan and execute actions required to achieve the target tasks.This limitation renders them unsuitable for deployment in open,unstructured environments that demand sophisticated interaction and seamless collaboration with humans.
基金supported by the National Natural Science Foundation of China(Grant No.:62172458).
文摘The rapid advancement of artificial intelligence(AI)has ushered in a new era of medical multimodal large language models(MLLMs),which integrate diverse data modalities such as text,imaging,physiological signals,and genomics to enhance clinical decision-making.This systematic review explores the core methodologies and applied research frontiers of medical MLLMs,focusing on their architecture,training methods,evaluation techniques,and applications.We highlight the transformative potential of MLLMs in achieving cross-modal semantic alignment,medical knowledge integration,and robust clinical reasoning.Despite their promise,challenges such as data heterogeneity,hallucination,and computational efficiency persist.By reviewing state-of-the-art solutions and future directions,this paper provides a comprehensive technical guide for developing reliable and interpretable medical MLLMs,ultimately aiming to bridge the gap between AI and clinical practice.
基金supported by the National Key Research and Development Program of China(2021YFB2104800)the National Natural Science Foundation of China(62103411,62436010,72171230)the Science and Technology Development Fund of Macao SAR(0093/2023/RIA2,0050/2020/A1).
文摘This paper introduces federated services as a smart service ecology with federated security to align distributed data supply with diversified service demands spanning digital and societal contexts.It presents the comprehensive researches on the theoretical foundation and technical system of federated services,aiming at advancing our understanding and implementation of this novel service paradigm.First,a thorough examination of the characteristics of federated security within federated services is conducted.Then,a five-layer technical framework is formulated under a decentralized intelligent architecture,ensuring secure,agile,and adaptable service provision.On this basis,the operational mechanisms underlying data federation and service confederation is analyzed,with emphasis on the smart supply-demand matching model.Furthermore,a scenario-oriented taxonomy of federated services accompanied by illustrative examples is proposed.Our work offers actionable insights and roadmap for realizing and advancing federated services,contributing to the refinement and wider adoption of this transformative service paradigm in the digital era.
文摘The rapid advancements in large language models(LLMs)have led to an ex-ponential increase in survey papers,making it challenging to systematically track and analyze their evolving taxonomy.This study employs graph repre-sentation learning combined with classical machine learning techniques to model and interpret the structural evolution of LLM-related survey papers.By constructing attributed graphs that capture topic distributions and intercon-nections,we provide a data-driven framework to explore research trends in this domain.A dataset of 241 survey papers published between July 2021 and January 2024 is analyzed to identify thematic developments and interdiscipli-nary relationships.The results highlight key areas of specialization,including the emergence of prompting science,multimodal models,and domain-spe-cific applications in finance,education,and law.Co-occurrence analysis of survey topics reveals strong interconnections between core LLM research and fields such as software engineering,hardware architecture,and evaluation methodologies.These findings demonstrate the increasing specialization of LLM research and its growing integration across multiple disciplines.By lev-eraging graph-based methodologies,this study offers a structured approach to understanding the LLM survey landscape,facilitating efficient navigation of existing literature and identification of emerging research directions.The in-sights presented contribute to a more comprehensive understanding of the field’s trajectory,assisting researchers and practitioners in engaging with the latest developments in LLM research.
基金financial supports provided by the National Natural Science Foundation of China (No.51274202)the Fundamental Research Funds for the Central Universities (No.2013RC11)+3 种基金the Science and Technology Achievements Transformation Project of Jiangsu Province (No.BA2012068)the Natural Science Foundation of Jiangsu Province (Nos.BK20130199 and BK20131124)Ceeusro Prospective Joint Research Project of Jiangsu Province (No.BY2014028-01)Great Cultivating Special Project at China University of Mining and Technology (No.2014ZDPY16)
文摘For the multipath fading on electromagnetic waves of wireless communication in the confined areas,the rectangular tunnel cooperative communication system was established based on the multimode channel model and the channel capacity formula derivation was obtained.On the optimal criterion of the channel capacity,the power allocation methods of both amplifying and forwarding(AF) and decoding and forwarding(DF) cooperative communication systems were proposed in the limitation of the total power to maximize the channel capacity.The mode selection methods of single input single output(SISO) and single input multiple output(SIMO) models in the rectangular tunnel,through which the higher channel capacity can be obtained,were put forward as well.The theoretical analysis and simulation comparison show that,channel capacity of the wireless communication system in the rectangular tunnel can be effectively enhanced through the cooperative technology;channel capacity of the rectangular tunnel under complicated conditions is maximized through the proposed power allocation methods,and the optimal cooperative mode of the channel capacity can be chosen according to the cooperative mode selection methods given in the paper.