BACKGROUND Gastric cancer(GC)is a prevalent tumor in the digestive system,with around one million new cases reported annually,ranking it as the third most common malignancy.Reducing pain is a key research focus.This s...BACKGROUND Gastric cancer(GC)is a prevalent tumor in the digestive system,with around one million new cases reported annually,ranking it as the third most common malignancy.Reducing pain is a key research focus.This study evaluates the effect of nalbuphine on the analgesic effect and the expression of pain factors in patients after radical resection.AIM To provide a reference for postoperative analgesia methods.METHODS One hundred eight patients with GC,admitted between January 2022 and June 2024,underwent radical gastrectomy.They received a controlled analgesia pump and a transverse abdominis muscle plane block,divided into two groups of 54 patients in each group.The control group received sufentanil,while the observation group received nalbuphine as an analgesic.Postoperative analgesic effects,pain factor expression,and adverse effects were compared.RESULTS The resting pain and activity pain scores in the observation group at 6,12,24 and 48 hours were significantly lower than those in the control group.Additionally,the number of presses and consumption of the observation group at 48 hours were lower than those of the control group;and the response rate of the observation group was higher than that of the control group(P<0.05).The prostaglandin E2,substance P,and serotonin levels 24 hours after the observation group were lower than those in the control group,and the incidence of adverse reactions was 5.56%lower than 22.22%in the control group(P<0.05).CONCLUSION The findings suggest that nalbuphine enhances postoperative multimodal analgesia in patients with radical GC,effectively improving postoperative analgesic effect,relieving postoperative resting and active pain,and reducing postoperative pain factor expression,demonstrating its potential for clinical application.展开更多
Multimodal sensor fusion can make full use of the advantages of various sensors,make up for the shortcomings of a single sensor,achieve information verification or information security through information redundancy,a...Multimodal sensor fusion can make full use of the advantages of various sensors,make up for the shortcomings of a single sensor,achieve information verification or information security through information redundancy,and improve the reliability and safety of the system.Artificial intelligence(AI),referring to the simulation of human intelligence in machines that are programmed to think and learn like humans,represents a pivotal frontier in modern scientific research.With the continuous development and promotion of AI technology in Sensor 4.0 age,multimodal sensor fusion is becoming more and more intelligent and automated,and is expected to go further in the future.With this context,this review article takes a comprehensive look at the recent progress on AI-enhanced multimodal sensors and their integrated devices and systems.Based on the concept and principle of sensor technologies and AI algorithms,the theoretical underpinnings,technological breakthroughs,and pragmatic applications of AI-enhanced multimodal sensors in various fields such as robotics,healthcare,and environmental monitoring are highlighted.Through a comparative study of the dual/tri-modal sensors with and without using AI technologies(especially machine learning and deep learning),AI-enhanced multimodal sensors highlight the potential of AI to improve sensor performance,data processing,and decision-making capabilities.Furthermore,the review analyzes the challenges and opportunities afforded by AI-enhanced multimodal sensors,and offers a prospective outlook on the forthcoming advancements.展开更多
Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.There...Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.展开更多
Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dim...Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review.展开更多
Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,rad...Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,radiomics,and multimodal data integration,AI has achieved diagnostic parity with expert cli-nicians in endoscopic image analysis(e.g.,early gastric cancer detection,colorectal polyp identification)and non-invasive assessment of liver pathologies(e.g.,fibrosis staging,fatty liver typing)while demonstrating utility in personalized care scenarios such as predicting hepatocellular carcinoma recurrence and opti-mizing inflammatory bowel disease treatment responses.Despite these advance-ments challenges persist including limited model generalization due to frag-mented datasets,algorithmic limitations in rare conditions(e.g.,pediatric liver diseases)caused by insufficient training data,and unresolved ethical issues related to bias,accountability,and patient privacy.Mitigation strategies involve constructing standardized multicenter databases,validating AI tools through prospective trials,leveraging federated learning to address data scarcity,and de-veloping interpretable systems(e.g.,attention heatmap visualization)to enhance clinical trust.Integrating generative AI,digital twin technologies,and establishing unified ethical/regulatory frameworks will accelerate AI adoption in primary care and foster equitable healthcare access while interdisciplinary collaboration and evidence-based implementation remain critical for realizing AI’s potential to redefine precision care for digestive disorders,improve global health outcomes,and reshape healthcare equity.展开更多
Sleep monitoring is an important part of health management because sleep quality is crucial for restoration of human health.However,current commercial products of polysomnography are cumbersome with connecting wires a...Sleep monitoring is an important part of health management because sleep quality is crucial for restoration of human health.However,current commercial products of polysomnography are cumbersome with connecting wires and state-of-the-art flexible sensors are still interferential for being attached to the body.Herein,we develop a flexible-integrated multimodal sensing patch based on hydrogel and its application in unconstraint sleep monitoring.The patch comprises a bottom hydrogel-based dualmode pressure–temperature sensing layer and a top electrospun nanofiber-based non-contact detection layer as one integrated device.The hydrogel as core substrate exhibits strong toughness and water retention,and the multimodal sensing of temperature,pressure,and non-contact proximity is realized based on different sensing mechanisms with no crosstalk interference.The multimodal sensing function is verified in a simulated real-world scenario by a robotic hand grasping objects to validate its practicability.Multiple multimodal sensing patches integrated on different locations of a pillow are assembled for intelligent sleep monitoring.Versatile human–pillow interaction information as well as their evolution over time are acquired and analyzed by a one-dimensional convolutional neural network.Track of head movement and recognition of bad patterns that may lead to poor sleep are achieved,which provides a promising approach for sleep monitoring.展开更多
Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimo...Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods.展开更多
Fourier Ptychographic Microscopy(FPM)is a high-throughput computational optical imaging technology reported in 2013.It effectively breaks through the trade-off between high-resolution imaging and wide-field imaging.In...Fourier Ptychographic Microscopy(FPM)is a high-throughput computational optical imaging technology reported in 2013.It effectively breaks through the trade-off between high-resolution imaging and wide-field imaging.In recent years,it has been found that FPM is not only a tool to break through the trade-off between field of view and spatial resolution,but also a paradigm to break through those trade-off problems,thus attracting extensive attention.Compared with previous reviews,this review does not introduce its concept,basic principles,optical system and series of applications once again,but focuses on elaborating the three major difficulties faced by FPM technology in the process from“looking good”in the laboratory to“working well”in practical applications:mismatch between numerical model and physical reality,long reconstruction time and high computing power demand,and lack of multi-modal expansion.It introduces how to achieve key technological innovations in FPM through the dual drive of Artificial Intelligence(AI)and physics,including intelligent reconstruction algorithms introducing machine learning concepts,optical-algorithm co-design,fusion of frequency domain extrapolation methods and generative adversarial networks,multi-modal imaging schemes and data fusion enhancement,etc.,gradually solving the difficulties of FPM technology.Conversely,this review deeply considers the unique value of FPM technology in potentially feeding back to the development of“AI+optics”,such as providing AI benchmark tests under physical constraints,inspirations for the balance of computing power and bandwidth in miniaturized intelligent microscopes,and photoelectric hybrid architectures.Finally,it introduces the industrialization path and frontier directions of FPM technology,pointing out that with the promotion of the dual drive of AI and physics,it will generate a large number of industrial application case,and looks forward to the possibilities of future application scenarios and expansions,for instance,body fluid biopsy and point-of-care testing at the grassroots level represent the expansion of the growth market.展开更多
Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces ...Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness.展开更多
Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved percepti...Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.展开更多
1.The development history of enhanced recovery after surgery(ERAS)Enhanced recovery after surgery(ERAS)is a multimodal perioperative care approach that has evolved over the past 2 decades since its inception.In 1997,P...1.The development history of enhanced recovery after surgery(ERAS)Enhanced recovery after surgery(ERAS)is a multimodal perioperative care approach that has evolved over the past 2 decades since its inception.In 1997,Professor Henrik Kehlet,also known as the“father of ERAS”,from the University of Copenhagen in Denmark first proposed the ERAS concept and discovered its clinical feasibility and superiority,achieving remarkable results.ERAS was initially applied in colorectal surgery;subsequently,the concept gradually gained popularity and application worldwide.展开更多
Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate...Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.展开更多
BACKGROUND Primary gastrointestinal lymphoma(PGIL)is a relatively uncommon clinical entity,exhibiting distinctive features including occult primary sites,nonspecific clinical presentations,and considerable diagnostic ...BACKGROUND Primary gastrointestinal lymphoma(PGIL)is a relatively uncommon clinical entity,exhibiting distinctive features including occult primary sites,nonspecific clinical presentations,and considerable diagnostic and therapeutic difficulties.Consequently,comprehensive clinical investigations into its clinicopathological characteristics and surgical intervention value are warranted to enhance dia-gnostic and therapeutic proficiency.AIM To investigate the clinicopathological characteristics and surgical significance of PGIL from a surgical perspective,providing a theoretical basis for optimizing diagnostic and therapeutic strategies.METHODS This study included 50 cases of PGIL treated by the General Surgery Department of the Chinese PLA Air Force Medical Center from June 2001 to March 2025.Data were extracted from the Electronic Medical Record system for retrospective analysis.A retrospective analysis was conducted on their epidemiological,clinical manifestations,imaging,pathological features,and treatment outcomes.Descriptive statistics were applied for data summarization,with continuous variables presented as frequencies and percentages.Correlations between variables were assessed using the Spearman rank correlation coefficient.RESULTS All cases had the gastrointestinal tract as the primary site.Abdominal pain was the most common initial symptom(52.0%),with 80.0%of patients experiencing pain during the course of the disease,and 38.0%experiencing hema-tochezia/melena or anemia.Computed tomography diagnosis exhibited a high overall sensitivity(94.3%);the en-doscopic detection rate was 91.5%.Diffuse large B-cell lymphoma was the most common subtype(52.0%).The im-provement rate was higher in the surgery combined with chemotherapy group than in the chemotherapy only group.The incidence of postoperative complications was 26.5%,all occurring in patients with tumors>5 cm.CONCLUSION Diffuse large B-cell lymphoma is the primary PGIL subtype.Imaging and endoscopic biopsy are diagnostic es-sentials.Surgery aids in resection,complication management,and pathologic diagnosis.Multidisciplinary,indi-vidualized strategies are recommended,necessitating further prospective molecular studies.展开更多
In this study,we present a small,integrated jumping-crawling robot capable of intermittent jumping and self-resetting.Compared to robots with a single mode of locomotion,this multi-modal robot exhibits enhanced obstac...In this study,we present a small,integrated jumping-crawling robot capable of intermittent jumping and self-resetting.Compared to robots with a single mode of locomotion,this multi-modal robot exhibits enhanced obstacle-surmounting capabilities.To achieve this,the robot employs a novel combination of a jumping module and a crawling module.The jumping module features improved energy storage capacity and an active clutch.Within the constraints of structural robustness,the jumping module maximizes the explosive power of the linear spring by utilizing the mechanical advantage of a closed-loop mechanism and controls the energy flow of the jumping module through an active clutch mechanism.Furthermore,inspired by the limb movements of tortoises during crawling and self-righting,a single-degree-of-freedom spatial four-bar crawling mechanism was designed to enable crawling,steering,and resetting functions.To demonstrate its practicality,the integrated jumping-crawling robot was tested in a laboratory environment for functions such as jumping,crawling,self-resetting,and steering.Experimental results confirmed the feasibility of the proposed integrated jumping-crawling robot.展开更多
With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existi...With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existing studies focus on the negative impacts of spoof stickers,while paying insufficient attention to their positive functions.From the perspective of multimodal metaphor,this paper uses methods such as virtual ethnography and image-text analysis to clarify the connotation of stickers,understand the evolution of their digital dissemination forms,and explore the multiple functions of subcultural stickers in the social interactions between teachers and students.Young students use stickers to convey emotions and information.Their expressive function,social function,and cultural metaphor function progress in a progressive manner.This not only shapes students’values but also promotes self-expression and teacher-student interaction.It also reminds teachers to correct students’negative thoughts by using stickers,achieving the effect of“cultivating and influencing people through culture.”展开更多
Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interactio...Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks.展开更多
With the rapid growth of the Internet and social media, information is widely disseminated in multimodal forms, such as text and images, where discriminatory content can manifest in various ways. Discrimination detect...With the rapid growth of the Internet and social media, information is widely disseminated in multimodal forms, such as text and images, where discriminatory content can manifest in various ways. Discrimination detection techniques for multilingual and multimodal data can identify potential discriminatory behavior and help foster a more equitable and inclusive cyberspace. However, existing methods often struggle in complex contexts and multilingual environments. To address these challenges, this paper proposes an innovative detection method, using image and multilingual text encoders to separately extract features from different modalities. It continuously updates a historical feature memory bank, aggregates the Top-K most similar samples, and utilizes a Gated Recurrent Unit (GRU) to integrate current and historical features, generating enhanced feature representations with stronger semantic expressiveness to improve the model’s ability to capture discriminatory signals. Experimental results demonstrate that the proposed method exhibits superior discriminative power and detection accuracy in multilingual and multimodal contexts, offering a reliable and effective solution for identifying discriminatory content.展开更多
Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion...Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.展开更多
A complete examination of Large Language Models’strengths,problems,and applications is needed due to their rising use across disciplines.Current studies frequently focus on single-use situations and lack a comprehens...A complete examination of Large Language Models’strengths,problems,and applications is needed due to their rising use across disciplines.Current studies frequently focus on single-use situations and lack a comprehensive understanding of LLM architectural performance,strengths,and weaknesses.This gap precludes finding the appropriate models for task-specific applications and limits awareness of emerging LLM optimization and deployment strategies.In this research,50 studies on 25+LLMs,including GPT-3,GPT-4,Claude 3.5,DeepKet,and hybrid multimodal frameworks like ContextDET and GeoRSCLIP,are thoroughly reviewed.We propose LLM application taxonomy by grouping techniques by task focus—healthcare,chemistry,sentiment analysis,agent-based simulations,and multimodal integration.Advanced methods like parameter-efficient tuning(LoRA),quantumenhanced embeddings(DeepKet),retrieval-augmented generation(RAG),and safety-focused models(GalaxyGPT)are evaluated for dataset requirements,computational efficiency,and performance measures.Frameworks for ethical issues,data limited hallucinations,and KDGI-enhanced fine-tuning like Woodpecker’s post-remedy corrections are highlighted.The investigation’s scope,mad,and methods are described,but the primary results are not.The work reveals that domain-specialized fine-tuned LLMs employing RAG and quantum-enhanced embeddings performbetter for context-heavy applications.In medical text normalization,ChatGPT-4 outperforms previous models,while two multimodal frameworks,GeoRSCLIP,increase remote sensing.Parameter-efficient tuning technologies like LoRA have minimal computing cost and similar performance,demonstrating the necessity for adaptive models in multiple domains.To discover the optimum domain-specific models,explain domain-specific fine-tuning,and present quantum andmultimodal LLMs to address scalability and cross-domain issues.The framework helps academics and practitioners identify,adapt,and innovate LLMs for different purposes.This work advances the field of efficient,interpretable,and ethical LLM application research.展开更多
文摘BACKGROUND Gastric cancer(GC)is a prevalent tumor in the digestive system,with around one million new cases reported annually,ranking it as the third most common malignancy.Reducing pain is a key research focus.This study evaluates the effect of nalbuphine on the analgesic effect and the expression of pain factors in patients after radical resection.AIM To provide a reference for postoperative analgesia methods.METHODS One hundred eight patients with GC,admitted between January 2022 and June 2024,underwent radical gastrectomy.They received a controlled analgesia pump and a transverse abdominis muscle plane block,divided into two groups of 54 patients in each group.The control group received sufentanil,while the observation group received nalbuphine as an analgesic.Postoperative analgesic effects,pain factor expression,and adverse effects were compared.RESULTS The resting pain and activity pain scores in the observation group at 6,12,24 and 48 hours were significantly lower than those in the control group.Additionally,the number of presses and consumption of the observation group at 48 hours were lower than those of the control group;and the response rate of the observation group was higher than that of the control group(P<0.05).The prostaglandin E2,substance P,and serotonin levels 24 hours after the observation group were lower than those in the control group,and the incidence of adverse reactions was 5.56%lower than 22.22%in the control group(P<0.05).CONCLUSION The findings suggest that nalbuphine enhances postoperative multimodal analgesia in patients with radical GC,effectively improving postoperative analgesic effect,relieving postoperative resting and active pain,and reducing postoperative pain factor expression,demonstrating its potential for clinical application.
基金supported by the National Natural Science Foundation of China(No.62404111)Natural Science Foundation of Jiangsu Province(No.BK20240635)+2 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions of China(No.24KJB510025)Natural Science Research Start-up Foundation of Recruiting Talents of Nanjing University of Posts and Telecommunications(No.NY223157 and NY223156)Opening Project of Advanced Inte-grated Circuit Package and Testing Research Center of Jiangsu Province(No.NTIKFJJ202303).
文摘Multimodal sensor fusion can make full use of the advantages of various sensors,make up for the shortcomings of a single sensor,achieve information verification or information security through information redundancy,and improve the reliability and safety of the system.Artificial intelligence(AI),referring to the simulation of human intelligence in machines that are programmed to think and learn like humans,represents a pivotal frontier in modern scientific research.With the continuous development and promotion of AI technology in Sensor 4.0 age,multimodal sensor fusion is becoming more and more intelligent and automated,and is expected to go further in the future.With this context,this review article takes a comprehensive look at the recent progress on AI-enhanced multimodal sensors and their integrated devices and systems.Based on the concept and principle of sensor technologies and AI algorithms,the theoretical underpinnings,technological breakthroughs,and pragmatic applications of AI-enhanced multimodal sensors in various fields such as robotics,healthcare,and environmental monitoring are highlighted.Through a comparative study of the dual/tri-modal sensors with and without using AI technologies(especially machine learning and deep learning),AI-enhanced multimodal sensors highlight the potential of AI to improve sensor performance,data processing,and decision-making capabilities.Furthermore,the review analyzes the challenges and opportunities afforded by AI-enhanced multimodal sensors,and offers a prospective outlook on the forthcoming advancements.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFC3004104)the National Natural Science Foundation of China(Grant No.U2342204)+4 种基金the Innovation and Development Program of the China Meteorological Administration(Grant No.CXFZ2024J001)the Open Research Project of the Key Open Laboratory of Hydrology and Meteorology of the China Meteorological Administration(Grant No.23SWQXZ010)the Science and Technology Plan Project of Zhejiang Province(Grant No.2022C03150)the Open Research Fund Project of Anyang National Climate Observatory(Grant No.AYNCOF202401)the Open Bidding for Selecting the Best Candidates Program(Grant No.CMAJBGS202318)。
文摘Thunderstorm wind gusts are small in scale,typically occurring within a range of a few kilometers.It is extremely challenging to monitor and forecast thunderstorm wind gusts using only automatic weather stations.Therefore,it is necessary to establish thunderstorm wind gust identification techniques based on multisource high-resolution observations.This paper introduces a new algorithm,called thunderstorm wind gust identification network(TGNet).It leverages multimodal feature fusion to fuse the temporal and spatial features of thunderstorm wind gust events.The shapelet transform is first used to extract the temporal features of wind speeds from automatic weather stations,which is aimed at distinguishing thunderstorm wind gusts from those caused by synoptic-scale systems or typhoons.Then,the encoder,structured upon the U-shaped network(U-Net)and incorporating recurrent residual convolutional blocks(R2U-Net),is employed to extract the corresponding spatial convective characteristics of satellite,radar,and lightning observations.Finally,by using the multimodal deep fusion module based on multi-head cross-attention,the temporal features of wind speed at each automatic weather station are incorporated into the spatial features to obtain 10-minutely classification of thunderstorm wind gusts.TGNet products have high accuracy,with a critical success index reaching 0.77.Compared with those of U-Net and R2U-Net,the false alarm rate of TGNet products decreases by 31.28%and 24.15%,respectively.The new algorithm provides grid products of thunderstorm wind gusts with a spatial resolution of 0.01°,updated every 10minutes.The results are finer and more accurate,thereby helping to improve the accuracy of operational warnings for thunderstorm wind gusts.
文摘Multimodal deep learning has emerged as a key paradigm in contemporary medical diagnostics,advancing precision medicine by enabling integration and learning from diverse data sources.The exponential growth of high-dimensional healthcare data,encompassing genomic,transcriptomic,and other omics profiles,as well as radiological imaging and histopathological slides,makes this approach increasingly important because,when examined separately,these data sources only offer a fragmented picture of intricate disease processes.Multimodal deep learning leverages the complementary properties of multiple data modalities to enable more accurate prognostic modeling,more robust disease characterization,and improved treatment decision-making.This review provides a comprehensive overview of the current state of multimodal deep learning approaches in medical diagnosis.We classify and examine important application domains,such as(1)radiology,where automated report generation and lesion detection are facilitated by image-text integration;(2)histopathology,where fusion models improve tumor classification and grading;and(3)multi-omics,where molecular subtypes and latent biomarkers are revealed through cross-modal learning.We provide an overview of representative research,methodological advancements,and clinical consequences for each domain.Additionally,we critically analyzed the fundamental issues preventing wider adoption,including computational complexity(particularly in training scalable,multi-branch networks),data heterogeneity(resulting from modality-specific noise,resolution variations,and inconsistent annotations),and the challenge of maintaining significant cross-modal correlations during fusion.These problems impede interpretability,which is crucial for clinical trust and use,in addition to performance and generalizability.Lastly,we outline important areas for future research,including the development of standardized protocols for harmonizing data,the creation of lightweight and interpretable fusion architectures,the integration of real-time clinical decision support systems,and the promotion of cooperation for federated multimodal learning.Our goal is to provide researchers and clinicians with a concise overview of the field’s present state,enduring constraints,and exciting directions for further research through this review.
基金Supported by the Natural Science Foundation of Jilin Province,No.YDZJ202401182ZYTSJilin Provincial Key Laboratory of Precision Infectious Diseases,No.20200601011JCJilin Provincial Engineering Laboratory of Precision Prevention and Control for Common Diseases,Jilin Province Development and Reform Commission,No.2022C036.
文摘Artificial intelligence(AI)is driving a paradigm shift in gastroenterology and hepa-tology by delivering cutting-edge tools for disease screening,diagnosis,treatment,and prognostic management.Through deep learning,radiomics,and multimodal data integration,AI has achieved diagnostic parity with expert cli-nicians in endoscopic image analysis(e.g.,early gastric cancer detection,colorectal polyp identification)and non-invasive assessment of liver pathologies(e.g.,fibrosis staging,fatty liver typing)while demonstrating utility in personalized care scenarios such as predicting hepatocellular carcinoma recurrence and opti-mizing inflammatory bowel disease treatment responses.Despite these advance-ments challenges persist including limited model generalization due to frag-mented datasets,algorithmic limitations in rare conditions(e.g.,pediatric liver diseases)caused by insufficient training data,and unresolved ethical issues related to bias,accountability,and patient privacy.Mitigation strategies involve constructing standardized multicenter databases,validating AI tools through prospective trials,leveraging federated learning to address data scarcity,and de-veloping interpretable systems(e.g.,attention heatmap visualization)to enhance clinical trust.Integrating generative AI,digital twin technologies,and establishing unified ethical/regulatory frameworks will accelerate AI adoption in primary care and foster equitable healthcare access while interdisciplinary collaboration and evidence-based implementation remain critical for realizing AI’s potential to redefine precision care for digestive disorders,improve global health outcomes,and reshape healthcare equity.
基金supported by the National Key Research and Development Program of China under Grant(2024YFE0100400)Taishan Scholars Project Special Funds(tsqn202312035)+2 种基金the open research foundation of State Key Laboratory of Integrated Chips and Systems,the Tianjin Science and Technology Plan Project(No.22JCZDJC00630)the Higher Education Institution Science and Technology Research Project of Hebei Province(No.JZX2024024)Jinan City-University Integrated Development Strategy Project under Grant(JNSX2023017).
文摘Sleep monitoring is an important part of health management because sleep quality is crucial for restoration of human health.However,current commercial products of polysomnography are cumbersome with connecting wires and state-of-the-art flexible sensors are still interferential for being attached to the body.Herein,we develop a flexible-integrated multimodal sensing patch based on hydrogel and its application in unconstraint sleep monitoring.The patch comprises a bottom hydrogel-based dualmode pressure–temperature sensing layer and a top electrospun nanofiber-based non-contact detection layer as one integrated device.The hydrogel as core substrate exhibits strong toughness and water retention,and the multimodal sensing of temperature,pressure,and non-contact proximity is realized based on different sensing mechanisms with no crosstalk interference.The multimodal sensing function is verified in a simulated real-world scenario by a robotic hand grasping objects to validate its practicability.Multiple multimodal sensing patches integrated on different locations of a pillow are assembled for intelligent sleep monitoring.Versatile human–pillow interaction information as well as their evolution over time are acquired and analyzed by a one-dimensional convolutional neural network.Track of head movement and recognition of bad patterns that may lead to poor sleep are achieved,which provides a promising approach for sleep monitoring.
基金supported by the Science and Technology Project of Henan Province(No.222102210081).
文摘Joint Multimodal Aspect-based Sentiment Analysis(JMASA)is a significant task in the research of multimodal fine-grained sentiment analysis,which combines two subtasks:Multimodal Aspect Term Extraction(MATE)and Multimodal Aspect-oriented Sentiment Classification(MASC).Currently,most existing models for JMASA only perform text and image feature encoding from a basic level,but often neglect the in-depth analysis of unimodal intrinsic features,which may lead to the low accuracy of aspect term extraction and the poor ability of sentiment prediction due to the insufficient learning of intra-modal features.Given this problem,we propose a Text-Image Feature Fine-grained Learning(TIFFL)model for JMASA.First,we construct an enhanced adjacency matrix of word dependencies and adopt graph convolutional network to learn the syntactic structure features for text,which addresses the context interference problem of identifying different aspect terms.Then,the adjective-noun pairs extracted from image are introduced to enable the semantic representation of visual features more intuitive,which addresses the ambiguous semantic extraction problem during image feature learning.Thereby,the model performance of aspect term extraction and sentiment polarity prediction can be further optimized and enhanced.Experiments on two Twitter benchmark datasets demonstrate that TIFFL achieves competitive results for JMASA,MATE and MASC,thus validating the effectiveness of our proposed methods.
基金National Natural Science Foundation of China(No.12574332)the Space Optoelectronic Measurement and Perception Lab.,Beijing Institute of Control Engineering(No.LabSOMP-2023-10)Major Science and Technology Innovation Program of Xianyang City(No.L2024-ZDKJ-ZDCGZH-0021)。
文摘Fourier Ptychographic Microscopy(FPM)is a high-throughput computational optical imaging technology reported in 2013.It effectively breaks through the trade-off between high-resolution imaging and wide-field imaging.In recent years,it has been found that FPM is not only a tool to break through the trade-off between field of view and spatial resolution,but also a paradigm to break through those trade-off problems,thus attracting extensive attention.Compared with previous reviews,this review does not introduce its concept,basic principles,optical system and series of applications once again,but focuses on elaborating the three major difficulties faced by FPM technology in the process from“looking good”in the laboratory to“working well”in practical applications:mismatch between numerical model and physical reality,long reconstruction time and high computing power demand,and lack of multi-modal expansion.It introduces how to achieve key technological innovations in FPM through the dual drive of Artificial Intelligence(AI)and physics,including intelligent reconstruction algorithms introducing machine learning concepts,optical-algorithm co-design,fusion of frequency domain extrapolation methods and generative adversarial networks,multi-modal imaging schemes and data fusion enhancement,etc.,gradually solving the difficulties of FPM technology.Conversely,this review deeply considers the unique value of FPM technology in potentially feeding back to the development of“AI+optics”,such as providing AI benchmark tests under physical constraints,inspirations for the balance of computing power and bandwidth in miniaturized intelligent microscopes,and photoelectric hybrid architectures.Finally,it introduces the industrialization path and frontier directions of FPM technology,pointing out that with the promotion of the dual drive of AI and physics,it will generate a large number of industrial application case,and looks forward to the possibilities of future application scenarios and expansions,for instance,body fluid biopsy and point-of-care testing at the grassroots level represent the expansion of the growth market.
基金supported by the Funding for Research on the Evolution of Cyberbullying Incidents and Intervention Strategies(24BSH033)Discipline Innovation and Talent Introduction Bases in Higher Education Institutions(B20087).
文摘Hateful meme is a multimodal medium that combines images and texts.The potential hate content of hateful memes has caused serious problems for social media security.The current hateful memes classification task faces significant data scarcity challenges,and direct fine-tuning of large-scale pre-trained models often leads to severe overfitting issues.In addition,it is a challenge to understand the underlying relationship between text and images in the hateful memes.To address these issues,we propose a multimodal hateful memes classification model named LABF,which is based on low-rank adapter layers and bidirectional gated feature fusion.Firstly,low-rank adapter layers are adopted to learn the feature representation of the new dataset.This is achieved by introducing a small number of additional parameters while retaining prior knowledge of the CLIP model,which effectively alleviates the overfitting phenomenon.Secondly,a bidirectional gated feature fusion mechanism is designed to dynamically adjust the interaction weights of text and image features to achieve finer cross-modal fusion.Experimental results show that the method significantly outperforms existing methods on two public datasets,verifying its effectiveness and robustness.
文摘Large models,such as large language models(LLMs),vision-language models(VLMs),and multimodal agents,have become key elements in artificial intelli⁃gence(AI)systems.Their rapid development has greatly improved perception,generation,and decision-making in various fields.However,their vast scale and complexity bring about new security challenges.Issues such as backdoor vulnerabilities during training,jailbreaking in multimodal rea⁃soning,and data provenance and copyright auditing have made security a critical focus for both academia and industry.
文摘1.The development history of enhanced recovery after surgery(ERAS)Enhanced recovery after surgery(ERAS)is a multimodal perioperative care approach that has evolved over the past 2 decades since its inception.In 1997,Professor Henrik Kehlet,also known as the“father of ERAS”,from the University of Copenhagen in Denmark first proposed the ERAS concept and discovered its clinical feasibility and superiority,achieving remarkable results.ERAS was initially applied in colorectal surgery;subsequently,the concept gradually gained popularity and application worldwide.
文摘Visual question answering(VQA)is a multimodal task,involving a deep understanding of the image scene and the question’s meaning and capturing the relevant correlations between both modalities to infer the appropriate answer.In this paper,we propose a VQA system intended to answer yes/no questions about real-world images,in Arabic.To support a robust VQA system,we work in two directions:(1)Using deep neural networks to semantically represent the given image and question in a fine-grainedmanner,namely ResNet-152 and Gated Recurrent Units(GRU).(2)Studying the role of the utilizedmultimodal bilinear pooling fusion technique in the trade-o.between the model complexity and the overall model performance.Some fusion techniques could significantly increase the model complexity,which seriously limits their applicability for VQA models.So far,there is no evidence of how efficient these multimodal bilinear pooling fusion techniques are for VQA systems dedicated to yes/no questions.Hence,a comparative analysis is conducted between eight bilinear pooling fusion techniques,in terms of their ability to reduce themodel complexity and improve themodel performance in this case of VQA systems.Experiments indicate that these multimodal bilinear pooling fusion techniques have improved the VQA model’s performance,until reaching the best performance of 89.25%.Further,experiments have proven that the number of answers in the developed VQA system is a critical factor that a.ects the effectiveness of these multimodal bilinear pooling techniques in achieving their main objective of reducing the model complexity.The Multimodal Local Perception Bilinear Pooling(MLPB)technique has shown the best balance between the model complexity and its performance,for VQA systems designed to answer yes/no questions.
基金Supported by the Outstanding Young Talents Program of Air Force Medical Center,People’s Liberation Army,No.22BJQN004Clinical Program of Air Force Medical University,No.Xiaoke2022-07.
文摘BACKGROUND Primary gastrointestinal lymphoma(PGIL)is a relatively uncommon clinical entity,exhibiting distinctive features including occult primary sites,nonspecific clinical presentations,and considerable diagnostic and therapeutic difficulties.Consequently,comprehensive clinical investigations into its clinicopathological characteristics and surgical intervention value are warranted to enhance dia-gnostic and therapeutic proficiency.AIM To investigate the clinicopathological characteristics and surgical significance of PGIL from a surgical perspective,providing a theoretical basis for optimizing diagnostic and therapeutic strategies.METHODS This study included 50 cases of PGIL treated by the General Surgery Department of the Chinese PLA Air Force Medical Center from June 2001 to March 2025.Data were extracted from the Electronic Medical Record system for retrospective analysis.A retrospective analysis was conducted on their epidemiological,clinical manifestations,imaging,pathological features,and treatment outcomes.Descriptive statistics were applied for data summarization,with continuous variables presented as frequencies and percentages.Correlations between variables were assessed using the Spearman rank correlation coefficient.RESULTS All cases had the gastrointestinal tract as the primary site.Abdominal pain was the most common initial symptom(52.0%),with 80.0%of patients experiencing pain during the course of the disease,and 38.0%experiencing hema-tochezia/melena or anemia.Computed tomography diagnosis exhibited a high overall sensitivity(94.3%);the en-doscopic detection rate was 91.5%.Diffuse large B-cell lymphoma was the most common subtype(52.0%).The im-provement rate was higher in the surgery combined with chemotherapy group than in the chemotherapy only group.The incidence of postoperative complications was 26.5%,all occurring in patients with tumors>5 cm.CONCLUSION Diffuse large B-cell lymphoma is the primary PGIL subtype.Imaging and endoscopic biopsy are diagnostic es-sentials.Surgery aids in resection,complication management,and pathologic diagnosis.Multidisciplinary,indi-vidualized strategies are recommended,necessitating further prospective molecular studies.
基金supported by the National Natural Science Foundation of China(Nos.51375383).
文摘In this study,we present a small,integrated jumping-crawling robot capable of intermittent jumping and self-resetting.Compared to robots with a single mode of locomotion,this multi-modal robot exhibits enhanced obstacle-surmounting capabilities.To achieve this,the robot employs a novel combination of a jumping module and a crawling module.The jumping module features improved energy storage capacity and an active clutch.Within the constraints of structural robustness,the jumping module maximizes the explosive power of the linear spring by utilizing the mechanical advantage of a closed-loop mechanism and controls the energy flow of the jumping module through an active clutch mechanism.Furthermore,inspired by the limb movements of tortoises during crawling and self-righting,a single-degree-of-freedom spatial four-bar crawling mechanism was designed to enable crawling,steering,and resetting functions.To demonstrate its practicality,the integrated jumping-crawling robot was tested in a laboratory environment for functions such as jumping,crawling,self-resetting,and steering.Experimental results confirmed the feasibility of the proposed integrated jumping-crawling robot.
文摘With the popularization of social media,stickers have become an important tool for young students to express themselves and resist mainstream culture due to their unique visual and emotional expressiveness.Most existing studies focus on the negative impacts of spoof stickers,while paying insufficient attention to their positive functions.From the perspective of multimodal metaphor,this paper uses methods such as virtual ethnography and image-text analysis to clarify the connotation of stickers,understand the evolution of their digital dissemination forms,and explore the multiple functions of subcultural stickers in the social interactions between teachers and students.Young students use stickers to convey emotions and information.Their expressive function,social function,and cultural metaphor function progress in a progressive manner.This not only shapes students’values but also promotes self-expression and teacher-student interaction.It also reminds teachers to correct students’negative thoughts by using stickers,achieving the effect of“cultivating and influencing people through culture.”
基金National Natural Science Foundation of China,No.42361040。
文摘Population migration data derived from location-based services has often been used to delineate population flows between cities or construct intercity relationship networks to reveal and explore the complex interaction patterns underlying human activities.Nevertheless,the inherent heterogeneity in multimodal migration big data has been ignored.This study conducts an in-depth comparison and quantitative analysis through a comprehensive lens of spatial association.Initially,the intercity interactive networks in China were constructed,utilizing migration data from Baidu and AutoNavi collected during the same time period.Subsequently,the characteristics and spatial structure similarities of the two types of intercity interactive networks were quantitatively assessed and analyzed from overall(network)and local(node)perspectives.Furthermore,the precision of these networks at the local scale is corroborated by constructing an intercity network from mobile phone(MP)data.Results indicate that the intercity interactive networks in China,as delineated by Baidu and AutoNavi migration flows,exhibit a high degree of structure equivalence.The correlation coefficient between these two networks is 0.874.Both networks exhibit a pronounced spatial polarization trend and hierarchical structure.This is evident in their distinct core and peripheral structures,as well as in the varying importance and influence of different nodes within the networks.Nevertheless,there are notable differences worthy of attention.Baidu intercity interactive network exhibits pronounced cross-regional effects,and its high-level interactions are characterized by a“rich-club”phenomenon.The AutoNavi intercity interactive network presents a more significant distance attenuation effect,and the high-level interactions display a gradient distribution pattern.Notably,there exists a substantial correlation between the AutoNavi and MP networks at the local scale,evidenced by a high correlation coefficient of 0.954.Furthermore,the“spatial dislocations”phenomenon was observed within the spatial structures at different levels,extracted from the Baidu and AutoNavi intercity networks.However,the measured results of network spatial structure similarity from three dimensions,namely,node location,node size,and local structure,indicate a relatively high similarity and consistency between the two networks.
基金funded by the Open Foundation of Key Laboratory of Cyberspace Security,Ministry of Education[KLCS20240210].
文摘With the rapid growth of the Internet and social media, information is widely disseminated in multimodal forms, such as text and images, where discriminatory content can manifest in various ways. Discrimination detection techniques for multilingual and multimodal data can identify potential discriminatory behavior and help foster a more equitable and inclusive cyberspace. However, existing methods often struggle in complex contexts and multilingual environments. To address these challenges, this paper proposes an innovative detection method, using image and multilingual text encoders to separately extract features from different modalities. It continuously updates a historical feature memory bank, aggregates the Top-K most similar samples, and utilizes a Gated Recurrent Unit (GRU) to integrate current and historical features, generating enhanced feature representations with stronger semantic expressiveness to improve the model’s ability to capture discriminatory signals. Experimental results demonstrate that the proposed method exhibits superior discriminative power and detection accuracy in multilingual and multimodal contexts, offering a reliable and effective solution for identifying discriminatory content.
文摘Aiming at the problems of traditional guide devices such as single environmental perception and poor terrain adaptability,this paper proposes an intelligent guide system based on a quadruped robot platform.Data fusion between millimeter-wave radar(with an accuracy of±0.1°)and an RGB-D camera is achieved through multisensor spatiotemporal registration technology,and a dataset suitable for guide dog robots is constructed.For the application scenario of edge-end guide dog robots,a lightweight CA-YOLOv11 target detection model integrated with an attention mechanism is innovatively adopted,achieving a comprehensive recognition accuracy of 95.8% in complex scenarios,which is 2.2% higher than that of the benchmark YOLOv11 network.The system supports navigation on complex terrains such as stairs(25 cm steps)and slopes(35°gradient),and the response time to sudden disturbances is shortened to 100 ms.Actual tests show that the navigation success rate reaches 95% in eight types of scenarios,the user satisfaction score is 4.8/5.0,and the cost is 50% lower than that of traditional guide dogs.
文摘A complete examination of Large Language Models’strengths,problems,and applications is needed due to their rising use across disciplines.Current studies frequently focus on single-use situations and lack a comprehensive understanding of LLM architectural performance,strengths,and weaknesses.This gap precludes finding the appropriate models for task-specific applications and limits awareness of emerging LLM optimization and deployment strategies.In this research,50 studies on 25+LLMs,including GPT-3,GPT-4,Claude 3.5,DeepKet,and hybrid multimodal frameworks like ContextDET and GeoRSCLIP,are thoroughly reviewed.We propose LLM application taxonomy by grouping techniques by task focus—healthcare,chemistry,sentiment analysis,agent-based simulations,and multimodal integration.Advanced methods like parameter-efficient tuning(LoRA),quantumenhanced embeddings(DeepKet),retrieval-augmented generation(RAG),and safety-focused models(GalaxyGPT)are evaluated for dataset requirements,computational efficiency,and performance measures.Frameworks for ethical issues,data limited hallucinations,and KDGI-enhanced fine-tuning like Woodpecker’s post-remedy corrections are highlighted.The investigation’s scope,mad,and methods are described,but the primary results are not.The work reveals that domain-specialized fine-tuned LLMs employing RAG and quantum-enhanced embeddings performbetter for context-heavy applications.In medical text normalization,ChatGPT-4 outperforms previous models,while two multimodal frameworks,GeoRSCLIP,increase remote sensing.Parameter-efficient tuning technologies like LoRA have minimal computing cost and similar performance,demonstrating the necessity for adaptive models in multiple domains.To discover the optimum domain-specific models,explain domain-specific fine-tuning,and present quantum andmultimodal LLMs to address scalability and cross-domain issues.The framework helps academics and practitioners identify,adapt,and innovate LLMs for different purposes.This work advances the field of efficient,interpretable,and ethical LLM application research.