In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes tech...In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes technical exchanges and learning globally.Second,resources required for large model R&D are difficult for a single institution to obtain.The evaluation of general large models also requires the participation of experts from various industries.Third,without open source collaboration,it is difficult to form a unified upper-layer software ecosystem.Therefore,open source has become an important cooperation mechanism to promote the development of AI and large models.There are two cases to illustrate how open source and international standards interact with each other.展开更多
Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on se...Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on separate image and text modalities,which may not fully utilize the information available from both sources.To address this limitation,we propose a novel multimodal large model,i.e.,the PKME-MLM(Prior Knowledge and Multi-label Emotion analysis based Multimodal Large Model for sarcasm detection).The PKME-MLM aims to enhance sarcasm detection by integrating prior knowledge to extract useful textual information from images,which is then combined with text data for deeper analysis.This method improves the integration of image and text data,addressing the limitation of previous models that process these modalities separately.Additionally,we incorporate multi-label sentiment analysis,refining sentiment labels to improve sarcasm recognition accuracy.This design overcomes the limitations of prior models that treated sentiment classification as a single-label problem,thereby improving sarcasm recognition by distinguishing subtle emotional cues from the text.Experimental results demonstrate that our approach achieves significant performance improvements in multimodal sarcasm detection tasks,with an accuracy(Acc.)of 94.35%,and Macro-Average Precision and Recall reaching 93.92%and 94.21%,respectively.These results highlight the potential of multimodal models in improving sarcasm detection and suggest that further integration of modalities could advance future research.This work also paves the way for incorporating multimodal sentiment analysis into sarcasm detection.展开更多
This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interaction...This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions.Within the context of large models,AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential.The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry,emphasizing its adoption and necessity from a user's perspective.This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective.The fundamental drivers of AIx include the introduction of novel capabilities,replication of capabilities(both anthropomorphic and superhuman),migration of capabilities,aggregation of intelligence,and multiplication of capabilities.These elements are essential for propelling innovation,expanding the frontiers of capability,and realizing the exponential superposition of capabilities,thereby mitigating labor redundancy and addressing a spectrum of human needs.Furthermore,this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models.Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services,thereby facilitating a more intuitive,adaptive,and personalized engagement between humans and machines.The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction,corresponding to twenty-one specific scenarios,including applications in smart home systems,health assistance,and elderly care.This emphasizes the significance of this new paradigm in advancing HCI,fostering technological advancements,and redefining user experiences.However,it also acknowledges the challenges and ethical considerations that accompany this paradigm shift,recognizing the need for a balanced approach to harness the full potential of AIx in modern society.展开更多
For computer science majors in higher education institutions,programming courses are one of the most important professional foundation courses.Proficiency in independent programming skills is of great help to the stud...For computer science majors in higher education institutions,programming courses are one of the most important professional foundation courses.Proficiency in independent programming skills is of great help to the study of subsequent courses and the personal development of students.In the teaching process of programming courses,online judgement systems are often used to improve students’programming level.Traditional online judgement systems lack guidance for students,and it is often difficult for inexperienced students to find and correct errors in their codes by themselves.We propose an online judgement system that integrates a large model of error correction to help students find errors and improve their programming skills.展开更多
Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved percepti...Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.展开更多
This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large mode...This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.展开更多
The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large ...The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large model is of great significance to promoting and coordinating the development of railway artificial intelligence.This paper puts forward the application scenarios of railway natural language large model according to the application requirements of railway artificial intelligence;designs the overall architecture of the railway natural language large model by relying on the railway artificial intelligence platform,studies the key technologies of the natural language large model,builds a railway industry large model oriented to intelligent question-answering,and verifies the model with actual data;finally,this paper prospects for the development and application of railway natural language large model from the aspects of railway traffic organization,railway operation safety and passenger service.展开更多
Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entere...Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entered the public eye,marking a significant milestone in LLM advancement.展开更多
To improve the accuracy and generalization of well logging curve reconstruction,this paper proposes an artificial intelligence large language model“Gaia”and conducts model evaluation experiments.By fine-tuning the p...To improve the accuracy and generalization of well logging curve reconstruction,this paper proposes an artificial intelligence large language model“Gaia”and conducts model evaluation experiments.By fine-tuning the pre-trained large language model,the Gaia significantly improved its ability in extracting sequential patterns and spatial features from well-log curves.Leveraging the adapter method for fine-tuning,this model required training only about 1/70 of its original parameters,greatly improving training efficiency.Comparative experiments,ablation experiments,and generalization experiments were designed and conducted using well-log data from 250 wells.In the comparative experiment,the Gaia model was benchmarked against cutting-edge small deep learning models and conventional large language models,demonstrating that the Gaia model reduced the mean absolute error(MAE)by at least 20%.In the ablation experiments,the synergistic effect of the Gaia model's multiple components was validated,with its MAE being at least 30%lower than that of single-component models.In the generalization experiments,the superior performance of the Gaia model in blind-well predictions was further confirmed.Compared to traditional models,the Gaia model is significantly superior in accuracy and generalization for logging curve reconstruction,fully showcasing the potential of large language models in the field of well-logging.This provides a new approach for future intelligent logging data processing.展开更多
The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These con...The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.展开更多
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si...The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.展开更多
The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language mod...The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language model Meta AI(LLaMa),have brought significant computational and energy challenges.Neuromorphic computing presents a biologically inspired approach to addressing these issues,leveraging event-driven processing and in-memory computation for enhanced energy efficiency.This survey explores the intersection of neuromorphic computing and large-scale deep learning models,focusing on neuromorphic models,learning methods,and hardware.We highlight transferable techniques from deep learning to neuromorphic computing and examine the memoryrelated scalability limitations of current neuromorphic systems.Furthermore,we identify potential directions to enable neuromorphic systems to meet the growing demands of modern AI workloads.展开更多
Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a...Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a significant role in the application field,greatly improving the efficiency of training and optimization,and contributing to the landing of many innovative artificial intelligence tools.Based on the Chinese PaddlePaddle large model framework,an application system is designed in combination with the intelligent classroom teaching scenario,which uses machine vision algorithms to distinguish and present teachers’and students’behaviors,that is,the digitization and multi-classification scheme of class character states.After having digital data,data analysis can be carried out to evaluate the class status of teachers and students,and the traditional subjective judgment such as peacetime grades and teaching ability can be upgraded to the objective judgment of artificial intelligence.展开更多
The rapid advancement of artificial intelligence technology is driving transformative changes in medical diagnosis,treatment,and management systems through large-scale deep learning models-a process that brings both g...The rapid advancement of artificial intelligence technology is driving transformative changes in medical diagnosis,treatment,and management systems through large-scale deep learning models-a process that brings both groundbreaking opportunities and multifaceted challenges.This study focuses on the medical and healthcare applications of large-scale deep learning architectures,conducting a comprehensive survey to categorize and analyze their diverse uses.The survey results reveal that current applications of large models in healthcare encompass medical data management,healthcare services,medical devices,and preventive medicine,among others.Concurrently,large models demonstrate significant advantages in the medical domain,especially in high-precision diagnosis and prediction,data analysis and knowledge discovery,and enhancing operational efficiency.Nevertheless,we identify several challenges that need urgent attention,including improving the interpretability of large models,strengthening privacy protection,and addressing issues related to handling incomplete data.This research is dedicated to systematically elucidating the deep collaborative mechanisms between artificial intelligence and the healthcare field,providing theoretical references and practical guidance for both academia and industry.展开更多
1 Background and motivation Recent advances in foundation models have ushered in a paradigm shift across the field of artificial intelligence(AI),with profound implications for financial technology(FinTech).Foundation...1 Background and motivation Recent advances in foundation models have ushered in a paradigm shift across the field of artificial intelligence(AI),with profound implications for financial technology(FinTech).Foundation models refer to large-scale neural networks trained on vast and heterogeneous corpora using self-supervised or instruction-driven objectives,which endow them with strong generalization and transfer capabilities across downstream tasks.Representative classes of such models,including large language models(LLMs),multimodal foundation models,and timeseries foundation models,exhibit emergent abilities in semantic understanding,reasoning,and multimodal representation learning.展开更多
Intelligent spatial-temporal data analysis,leveraging data such as multivariate time series and geographic information,provides researchers with powerful tools to uncover multiscale patterns and enhance decision-makin...Intelligent spatial-temporal data analysis,leveraging data such as multivariate time series and geographic information,provides researchers with powerful tools to uncover multiscale patterns and enhance decision-making processes.As artificial intelligence advances,intelligent spatial-temporal algorithms have found extensive applications across various disciplines,such as geosciences,biology,and public health.1 Compared to traditional methods,these algorithms are data driven,making them well suited for addressing the complexities of modeling real-world systems.However,their reliance on substantial domain-specific expertise limits their broader applicability.Recently,significant advancements have been made in spatial-temporal large models.Trained on large-scale data,these models exhibit a vast parameter scale,superior generalization capabilities,and multitasking advantages over previous methods.Their high versatility and scalability position them as promising super hubs for multidisciplinary research,integrating knowledge,intelligent algorithms,and research communities from different fields.Nevertheless,achieving this vision will require overcoming numerous critical challenges,offering an expansive and profound space for future exploration.展开更多
As the computational demands driven by large model technologies continue to grow rapidly,leveraging GPU hardware to expedite parallel training processes has emerged as a commonly-used strategy.When computational resou...As the computational demands driven by large model technologies continue to grow rapidly,leveraging GPU hardware to expedite parallel training processes has emerged as a commonly-used strategy.When computational resources within a single cluster are insufficient for large-model training,the hybrid utilization of heterogeneous acceleration hardware has emerged as a promising technical solution.The utilization of heterogeneous acceleration hardware and scheduling of diverse cloud resources have become a focal point of considerable interest.However,these computing resources are often geographically distributed.Due to the lack of awareness of heterogeneous devices and network topologies,existing parallel training frameworks struggle to leverage mixed GPU resources across constrained networks effectively.To boost the computing capability of the connected heterogeneous clusters,we propose HGTrainer,an optimizer designed to plan heterogeneous parallel strategies across distributed clusters for large model training.HGTrainer can adaptively saturate heterogeneous clusters because of the expanded tunable parallelism space for heterogeneous accelerators,with the awareness of relatively lower inter-cluster bandwidth.To achieve this goal,we formulate the model partitioning problem among heterogeneous hardware and introduce a hierarchical searching algorithm to solve the optimization problem.Besides,a mixed-precision pipeline method is used to reduce the cost of inter-cluster communications.We evaluate HGTrainer on heterogeneous connected clusters with popular large language models.The experimental result shows that HGTrainer effectively improves 1.49×training throughput on average for the mixed heterogeneous cluster compared with the state-of-the-art Metis.展开更多
Accurate and efficient bacterial detection is essential for public health and medical diagnostics. However, traditional detection methods are constrained by limited dataset size, complex bacterial morphology, and dive...Accurate and efficient bacterial detection is essential for public health and medical diagnostics. However, traditional detection methods are constrained by limited dataset size, complex bacterial morphology, and diverse detection environments, hindering their effectiveness. In this study, we present EagleEyeNet, a novel multi-scale information fusion model designed to address these challenges. EagleEyeNet leverages large models as teacher networks in a knowledge distillation framework, significantly improving detection performance. Additionally, a newly designed feature fusion architecture, integrating Transformer modules, is proposed to enable the efficient fusion of global and multi-scale features, overcoming the bottlenecks posed by Feature Pyramid Networks (FPN) structures, which in turn reduces information transmission loss between feature layers. To improve the model’s adaptability for different scenarios, we create our own QingDao Bacteria Detection (QDBD) dataset as a comprehensive evaluation benchmark for bacterial detection. Experimental results demonstrate that EagleEyeNet achieves remarkable performance improvements, with mAP50 increases of 3.1% on the QDBD dataset and 4.9% on the AGRA dataset, outperforming the State-Of-The-Art (SOTA) methods in detection accuracy. These findings underscore the transformative potential of integrating large models and deep learning for advancing bacterial detection technologies.展开更多
Ghost imaging(GI)enables 2D image reconstruction by leveraging high-order correlation between 1D bucket signals and 2D light field information.It demonstrates enhanced detection sensitivity and high-quality image reco...Ghost imaging(GI)enables 2D image reconstruction by leveraging high-order correlation between 1D bucket signals and 2D light field information.It demonstrates enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media.Recent studies have established that deep learning(DL)can substantially enhance the GI reconstruction quality.Furthermore,with the emergence of large models such as SDXL and GPT-4,the constraints of conventional DL in parameters and architecture have been transcended,enabling models to comprehensively explore relationships among all distinct positions within feature sequences.This paradigm shift has significantly advanced the capability of DL in restoring severely degraded and low-resolution imagery,making it particularly advantageous for noiserobust image reconstruction in GI applications.In this paper,we propose the first large imaging model with 1.4 billion parameters that incorporates the physical principles of GI(GILM).The proposed GILM implements a skip connection mechanism to mitigate gradient explosion challenges inherent in deep architectures,ensuring sufficient parametric capacity to capture intricate correlations between single-pixel measurements and the object.Moreover,GILM leverages multi-head attention mechanism to learn spatial dependencies across pixel points during image reconstruction,facilitating the extraction of comprehensive object information for subsequent reconstruction.We validated the effectiveness of GILM through a series of experiments,including simulated object imaging,imaging objects in free space,and imaging objects located 52 m away in an underwater environment.The experimental results demonstrate that GILM effectively captures the fluctuation trends of the collected signals,thereby facilitating accurate reconstruction of the object's image from the acquired data.Finally,GILM was successfully deployed on a portable computing platform,demonstrating its feasibility for practical engineering applications.展开更多
Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprec...Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice.展开更多
文摘In the era of AI,especially large models,the importance of open source has become increasingly prominent.First,open source allows innovation to avoid starting from scratch.Through iterative innovation,it promotes technical exchanges and learning globally.Second,resources required for large model R&D are difficult for a single institution to obtain.The evaluation of general large models also requires the participation of experts from various industries.Third,without open source collaboration,it is difficult to form a unified upper-layer software ecosystem.Therefore,open source has become an important cooperation mechanism to promote the development of AI and large models.There are two cases to illustrate how open source and international standards interact with each other.
基金funding partly by the National Natural Science Foundation of China under grant number 61701179.
文摘Sarcasm detection in Natural Language Processing(NLP)has become increasingly important,partic-ularly with the rise of social media and non-textual emotional expressions,such as images.Existing methods often rely on separate image and text modalities,which may not fully utilize the information available from both sources.To address this limitation,we propose a novel multimodal large model,i.e.,the PKME-MLM(Prior Knowledge and Multi-label Emotion analysis based Multimodal Large Model for sarcasm detection).The PKME-MLM aims to enhance sarcasm detection by integrating prior knowledge to extract useful textual information from images,which is then combined with text data for deeper analysis.This method improves the integration of image and text data,addressing the limitation of previous models that process these modalities separately.Additionally,we incorporate multi-label sentiment analysis,refining sentiment labels to improve sarcasm recognition accuracy.This design overcomes the limitations of prior models that treated sentiment classification as a single-label problem,thereby improving sarcasm recognition by distinguishing subtle emotional cues from the text.Experimental results demonstrate that our approach achieves significant performance improvements in multimodal sarcasm detection tasks,with an accuracy(Acc.)of 94.35%,and Macro-Average Precision and Recall reaching 93.92%and 94.21%,respectively.These results highlight the potential of multimodal models in improving sarcasm detection and suggest that further integration of modalities could advance future research.This work also paves the way for incorporating multimodal sentiment analysis into sarcasm detection.
文摘This study examines the advent of agent interaction(AIx)as a transformative paradigm in humancomputer interaction(HCI),signifying a notable evolution beyond traditional graphical interfaces and touchscreen interactions.Within the context of large models,AIx is characterized by its innovative interaction patterns and a plethora of application scenarios that hold great potential.The paper highlights the pivotal role of AIx in shaping the future landscape of the large model industry,emphasizing its adoption and necessity from a user's perspective.This study underscores the pivotal role of AIx in dictating the future trajectory of a large model industry by emphasizing the importance of its adoption and necessity from a user-centric perspective.The fundamental drivers of AIx include the introduction of novel capabilities,replication of capabilities(both anthropomorphic and superhuman),migration of capabilities,aggregation of intelligence,and multiplication of capabilities.These elements are essential for propelling innovation,expanding the frontiers of capability,and realizing the exponential superposition of capabilities,thereby mitigating labor redundancy and addressing a spectrum of human needs.Furthermore,this study provides an in-depth analysis of the structural components and operational mechanisms of agents supported by large models.Such advancements significantly enhance the capacity of agents to tackle complex problems and provide intelligent services,thereby facilitating a more intuitive,adaptive,and personalized engagement between humans and machines.The study further delineates four principal categories of interaction patterns that encompass eight distinct modalities of interaction,corresponding to twenty-one specific scenarios,including applications in smart home systems,health assistance,and elderly care.This emphasizes the significance of this new paradigm in advancing HCI,fostering technological advancements,and redefining user experiences.However,it also acknowledges the challenges and ethical considerations that accompany this paradigm shift,recognizing the need for a balanced approach to harness the full potential of AIx in modern society.
基金supported by Research and Construction of Experimental Teaching Aid Platform for Programming under the Teaching Reform Research Project of Shandong University。
文摘For computer science majors in higher education institutions,programming courses are one of the most important professional foundation courses.Proficiency in independent programming skills is of great help to the study of subsequent courses and the personal development of students.In the teaching process of programming courses,online judgement systems are often used to improve students’programming level.Traditional online judgement systems lack guidance for students,and it is often difficult for inexperienced students to find and correct errors in their codes by themselves.We propose an online judgement system that integrates a large model of error correction to help students find errors and improve their programming skills.
文摘Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.
基金Supported by the National Natural Science Foundation of China(72088101,42372175)PetroChina Science and Technology Innovation Fund Program(2021DQ02-0904)。
文摘This article elucidates the concept of large model technology,summarizes the research status of large model technology both domestically and internationally,provides an overview of the application status of large models in vertical industries,outlines the challenges and issues confronted in applying large models in the oil and gas sector,and offers prospects for the application of large models in the oil and gas industry.The existing large models can be briefly divided into three categories:large language models,visual large models,and multimodal large models.The application of large models in the oil and gas industry is still in its infancy.Based on open-source large language models,some oil and gas enterprises have released large language model products using methods like fine-tuning and retrieval augmented generation.Scholars have attempted to develop scenario-specific models for oil and gas operations by using visual/multimodal foundation models.A few researchers have constructed pre-trained foundation models for seismic data processing and interpretation,as well as core analysis.The application of large models in the oil and gas industry faces challenges such as current data quantity and quality being difficult to support the training of large models,high research and development costs,and poor algorithm autonomy and control.The application of large models should be guided by the needs of oil and gas business,taking the application of large models as an opportunity to improve data lifecycle management,enhance data governance capabilities,promote the construction of computing power,strengthen the construction of“artificial intelligence+energy”composite teams,and boost the autonomy and control of large model technology.
文摘The emergence of artificial intelligence natural language large models has brought new dawn for the in-depth empowerment of the industry.Research on key technologies and applications of railway natural language large model is of great significance to promoting and coordinating the development of railway artificial intelligence.This paper puts forward the application scenarios of railway natural language large model according to the application requirements of railway artificial intelligence;designs the overall architecture of the railway natural language large model by relying on the railway artificial intelligence platform,studies the key technologies of the natural language large model,builds a railway industry large model oriented to intelligent question-answering,and verifies the model with actual data;finally,this paper prospects for the development and application of railway natural language large model from the aspects of railway traffic organization,railway operation safety and passenger service.
文摘Following the groundbreaking introduction of the Transformer architecture in 2017,the development of Large Language Models(LLMs)formally commenced.In May 2020,Chat GPT-3,with over one hundred billion parameters,entered the public eye,marking a significant milestone in LLM advancement.
基金Supported by the National Natural Science Foundation of China(52288101)National Key R&D Program of China(2024YFF1500600)。
文摘To improve the accuracy and generalization of well logging curve reconstruction,this paper proposes an artificial intelligence large language model“Gaia”and conducts model evaluation experiments.By fine-tuning the pre-trained large language model,the Gaia significantly improved its ability in extracting sequential patterns and spatial features from well-log curves.Leveraging the adapter method for fine-tuning,this model required training only about 1/70 of its original parameters,greatly improving training efficiency.Comparative experiments,ablation experiments,and generalization experiments were designed and conducted using well-log data from 250 wells.In the comparative experiment,the Gaia model was benchmarked against cutting-edge small deep learning models and conventional large language models,demonstrating that the Gaia model reduced the mean absolute error(MAE)by at least 20%.In the ablation experiments,the synergistic effect of the Gaia model's multiple components was validated,with its MAE being at least 30%lower than that of single-component models.In the generalization experiments,the superior performance of the Gaia model in blind-well predictions was further confirmed.Compared to traditional models,the Gaia model is significantly superior in accuracy and generalization for logging curve reconstruction,fully showcasing the potential of large language models in the field of well-logging.This provides a new approach for future intelligent logging data processing.
基金supported in part by NSFC under Grant Nos.62402379,U22A2029 and U24A20237.
文摘The unprecedented scale of large models,such as large language models(LLMs)and text-to-image diffusion models,has raised critical concerns about the unauthorized use of copyrighted data during model training.These concerns have spurred a growing demand for dataset copyright auditing techniques,which aim to detect and verify potential infringements in the training data of commercial AI systems.This paper presents a survey of existing auditing solutions,categorizing them across key dimensions:data modality,model training stage,data overlap scenarios,and model access levels.We highlight major trends,including the prevalence of black-box auditing methods and the emphasis on fine-tuning rather than pre-training.Through an in-depth analysis of 12 representative works,we extract four key observations that reveal the limitations of current methods.Furthermore,we identify three open challenges and propose future directions for robust,multimodal,and scalable auditing solutions.Our findings underscore the urgent need to establish standardized benchmarks and develop auditing frameworks that are resilient to low watermark densities and applicable in diverse deployment settings.
基金The Natural Science Foundation of Hebei Province(F2024501044).
文摘The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given.
文摘The rapid advancement of deep learning and the emergence of largescale neural models,such as bidirectional encoder representations from transformers(BERT),generative pre-trained transformer(GPT),and large language model Meta AI(LLaMa),have brought significant computational and energy challenges.Neuromorphic computing presents a biologically inspired approach to addressing these issues,leveraging event-driven processing and in-memory computation for enhanced energy efficiency.This survey explores the intersection of neuromorphic computing and large-scale deep learning models,focusing on neuromorphic models,learning methods,and hardware.We highlight transferable techniques from deep learning to neuromorphic computing and examine the memoryrelated scalability limitations of current neuromorphic systems.Furthermore,we identify potential directions to enable neuromorphic systems to meet the growing demands of modern AI workloads.
基金Education Department of Hainan Provincial(Hnky2024-43)Sanya University’s Industry-Education Integration Project(USY-CJRH2313)Financial Innovation and Multi-Asset Intelligent Trading Laboratory of the Key Laboratory of Philosophy and Social Sciences in Hainan Province of University of Sanya.
文摘Deep learning has become a hot field of artificial intelligence,and the deep learning large model framework has become a bridgehead for the active layout of Chinese and foreign technology companies.Large models play a significant role in the application field,greatly improving the efficiency of training and optimization,and contributing to the landing of many innovative artificial intelligence tools.Based on the Chinese PaddlePaddle large model framework,an application system is designed in combination with the intelligent classroom teaching scenario,which uses machine vision algorithms to distinguish and present teachers’and students’behaviors,that is,the digitization and multi-classification scheme of class character states.After having digital data,data analysis can be carried out to evaluate the class status of teachers and students,and the traditional subjective judgment such as peacetime grades and teaching ability can be upgraded to the objective judgment of artificial intelligence.
基金funded by the National Natural Science Foundation of China(Grant No.62272236)the Natural Science Foundation of Jiangsu Province(Grant No.BK20201136).
文摘The rapid advancement of artificial intelligence technology is driving transformative changes in medical diagnosis,treatment,and management systems through large-scale deep learning models-a process that brings both groundbreaking opportunities and multifaceted challenges.This study focuses on the medical and healthcare applications of large-scale deep learning architectures,conducting a comprehensive survey to categorize and analyze their diverse uses.The survey results reveal that current applications of large models in healthcare encompass medical data management,healthcare services,medical devices,and preventive medicine,among others.Concurrently,large models demonstrate significant advantages in the medical domain,especially in high-precision diagnosis and prediction,data analysis and knowledge discovery,and enhancing operational efficiency.Nevertheless,we identify several challenges that need urgent attention,including improving the interpretability of large models,strengthening privacy protection,and addressing issues related to handling incomplete data.This research is dedicated to systematically elucidating the deep collaborative mechanisms between artificial intelligence and the healthcare field,providing theoretical references and practical guidance for both academia and industry.
文摘1 Background and motivation Recent advances in foundation models have ushered in a paradigm shift across the field of artificial intelligence(AI),with profound implications for financial technology(FinTech).Foundation models refer to large-scale neural networks trained on vast and heterogeneous corpora using self-supervised or instruction-driven objectives,which endow them with strong generalization and transfer capabilities across downstream tasks.Representative classes of such models,including large language models(LLMs),multimodal foundation models,and timeseries foundation models,exhibit emergent abilities in semantic understanding,reasoning,and multimodal representation learning.
基金supported by NSFC No.62372430the Youth Innovation Promotion As-sociation CAS No.2023112.
文摘Intelligent spatial-temporal data analysis,leveraging data such as multivariate time series and geographic information,provides researchers with powerful tools to uncover multiscale patterns and enhance decision-making processes.As artificial intelligence advances,intelligent spatial-temporal algorithms have found extensive applications across various disciplines,such as geosciences,biology,and public health.1 Compared to traditional methods,these algorithms are data driven,making them well suited for addressing the complexities of modeling real-world systems.However,their reliance on substantial domain-specific expertise limits their broader applicability.Recently,significant advancements have been made in spatial-temporal large models.Trained on large-scale data,these models exhibit a vast parameter scale,superior generalization capabilities,and multitasking advantages over previous methods.Their high versatility and scalability position them as promising super hubs for multidisciplinary research,integrating knowledge,intelligent algorithms,and research communities from different fields.Nevertheless,achieving this vision will require overcoming numerous critical challenges,offering an expansive and profound space for future exploration.
基金supported by the National Key R&D Program of China(No.2022ZD0115304)the National Natural Science Foundation of China for Young Scientists Fund(No.62402266)+1 种基金the National Natural Science Foundation of China for Distinguished Young Scholar(No.62225206)the CCF-Ant Group Research Fund CCF-AFSG(No.RF20240501).
文摘As the computational demands driven by large model technologies continue to grow rapidly,leveraging GPU hardware to expedite parallel training processes has emerged as a commonly-used strategy.When computational resources within a single cluster are insufficient for large-model training,the hybrid utilization of heterogeneous acceleration hardware has emerged as a promising technical solution.The utilization of heterogeneous acceleration hardware and scheduling of diverse cloud resources have become a focal point of considerable interest.However,these computing resources are often geographically distributed.Due to the lack of awareness of heterogeneous devices and network topologies,existing parallel training frameworks struggle to leverage mixed GPU resources across constrained networks effectively.To boost the computing capability of the connected heterogeneous clusters,we propose HGTrainer,an optimizer designed to plan heterogeneous parallel strategies across distributed clusters for large model training.HGTrainer can adaptively saturate heterogeneous clusters because of the expanded tunable parallelism space for heterogeneous accelerators,with the awareness of relatively lower inter-cluster bandwidth.To achieve this goal,we formulate the model partitioning problem among heterogeneous hardware and introduce a hierarchical searching algorithm to solve the optimization problem.Besides,a mixed-precision pipeline method is used to reduce the cost of inter-cluster communications.We evaluate HGTrainer on heterogeneous connected clusters with popular large language models.The experimental result shows that HGTrainer effectively improves 1.49×training throughput on average for the mixed heterogeneous cluster compared with the state-of-the-art Metis.
基金supported by the Shandong Provincial Department of Education.
文摘Accurate and efficient bacterial detection is essential for public health and medical diagnostics. However, traditional detection methods are constrained by limited dataset size, complex bacterial morphology, and diverse detection environments, hindering their effectiveness. In this study, we present EagleEyeNet, a novel multi-scale information fusion model designed to address these challenges. EagleEyeNet leverages large models as teacher networks in a knowledge distillation framework, significantly improving detection performance. Additionally, a newly designed feature fusion architecture, integrating Transformer modules, is proposed to enable the efficient fusion of global and multi-scale features, overcoming the bottlenecks posed by Feature Pyramid Networks (FPN) structures, which in turn reduces information transmission loss between feature layers. To improve the model’s adaptability for different scenarios, we create our own QingDao Bacteria Detection (QDBD) dataset as a comprehensive evaluation benchmark for bacterial detection. Experimental results demonstrate that EagleEyeNet achieves remarkable performance improvements, with mAP50 increases of 3.1% on the QDBD dataset and 4.9% on the AGRA dataset, outperforming the State-Of-The-Art (SOTA) methods in detection accuracy. These findings underscore the transformative potential of integrating large models and deep learning for advancing bacterial detection technologies.
基金supported by the Natural Science Basic Research Program of Shaanxi(Grant No.2024JC-YBMS-468)the State Key Laboratory for Underwater Information and Control(Grant No.2024-CXPT-GF-JJ-036-09)the National Key R&D Program of China(Grant No.2022YFC2808003)。
文摘Ghost imaging(GI)enables 2D image reconstruction by leveraging high-order correlation between 1D bucket signals and 2D light field information.It demonstrates enhanced detection sensitivity and high-quality image reconstruction via efficient photon collection in scattering media.Recent studies have established that deep learning(DL)can substantially enhance the GI reconstruction quality.Furthermore,with the emergence of large models such as SDXL and GPT-4,the constraints of conventional DL in parameters and architecture have been transcended,enabling models to comprehensively explore relationships among all distinct positions within feature sequences.This paradigm shift has significantly advanced the capability of DL in restoring severely degraded and low-resolution imagery,making it particularly advantageous for noiserobust image reconstruction in GI applications.In this paper,we propose the first large imaging model with 1.4 billion parameters that incorporates the physical principles of GI(GILM).The proposed GILM implements a skip connection mechanism to mitigate gradient explosion challenges inherent in deep architectures,ensuring sufficient parametric capacity to capture intricate correlations between single-pixel measurements and the object.Moreover,GILM leverages multi-head attention mechanism to learn spatial dependencies across pixel points during image reconstruction,facilitating the extraction of comprehensive object information for subsequent reconstruction.We validated the effectiveness of GILM through a series of experiments,including simulated object imaging,imaging objects in free space,and imaging objects located 52 m away in an underwater environment.The experimental results demonstrate that GILM effectively captures the fluctuation trends of the collected signals,thereby facilitating accurate reconstruction of the object's image from the acquired data.Finally,GILM was successfully deployed on a portable computing platform,demonstrating its feasibility for practical engineering applications.
基金This work was supported by the National Key R&D Program of China(2023YFC2415200)National Natural Science Foundation of China(82361168664,82372053,82441018,U24A20759,62222609,62076236,32350010,82302407,82302296)+3 种基金Beijing Natural Science Foundation(JQ24048,7232346)Beijing Nova Program(20240484528)Science and Technology Development Fund of Macao Special Administrative Region(0006/2023/AFJ)China Postdoctoral Science Foundation(2022M720357).
文摘Recent advances in large models demonstrate significant prospects for transforming the field of medical imaging.These models,including large language models,large visual models,and multimodal large models,offer unprecedented capabilities in processing and interpreting complex medical data across various imaging modalities.By leveraging self-supervised pretraining on vast unlabeled datasets,cross-modal representation learning,and domain-specific medical knowledge adaptation through fine-tuning,large models can achieve higher diagnostic accuracy and more efficient workflows for key clinical tasks.This review summarizes the concepts,methods,and progress of large models in medical imaging,highlighting their potential in precision medicine.The article first outlines the integration of multimodal data under large model technologies,approaches for training large models with medical datasets,and the need for robust evaluation metrics.It then explores how large models can revolutionize applications in critical tasks such as image segmentation,disease diagnosis,personalized treatment strategies,and real-time interactive systems,thus pushing the boundaries of traditional imaging analysis.Despite their potential,the practical implementation of large models in medical imaging faces notable challenges,including the scarcity of high-quality medical data,the need for optimized perception of imaging phenotypes,safety considerations,and seamless integration with existing clinical workflows and equipment.As research progresses,the development of more efficient,interpretable,and generalizable models will be critical to ensuring their reliable deployment across diverse clinical environments.This review aims to provide insights into the current state of the field and provide directions for future research to facilitate the broader adoption of large models in clinical practice.