期刊文献+
共找到907,490篇文章
< 1 2 250 >
每页显示 20 50 100
VOTI:Jailbreaking Vision-Language Models via Visual Obfuscation and Task Induction
1
作者 ZHU Yifan CHU Zhixuan REN Kui 《ZTE Communications》 2025年第3期15-26,共12页
In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become promine... In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become prominent.VLMs are vulnerable to jailbreak attacks,where attackers craft carefully designed prompts to bypass safety mechanisms,leading them to generate harmful content.To address this,we investigate the alignment between visual inputs and task execution,uncovering locality defects and attention biases in VLMs.Based on these findings,we propose VOTI,a novel jailbreak framework leveraging visual obfuscation and task induction.VOTI subtly embeds malicious keywords within neutral image layouts to evade detection,and breaks down harmful queries into a sequence of subtasks.This approach disperses malicious intent across modalities,exploiting VLMs’over-reliance on local visual cues and their fragility in multi-step reasoning to bypass global safety mechanisms.Implemented as an automated framework,VOTI integrates large language models as red-team assistants to generate and iteratively optimize jailbreak strategies.Extensive experiments across seven mainstream VLMs demonstrate VOTI’s effectiveness,achieving a 73.46%attack success rate on GPT-4o-mini.These results reveal critical vulnerabilities in VLMs,highlighting the urgent need for improving robust defenses and multimodal alignment. 展开更多
关键词 large vision-language models jailbreak attacks red teaming security of large models safety alignment
在线阅读 下载PDF
Vision-language model-based human-robot collaboration for smart manufacturing:A state-of-the-art survey 被引量:1
2
作者 Junming FAN Yue YIN +3 位作者 Tian WANG Wenhang DONG Pai ZHENG Lihui WANG 《Frontiers of Engineering Management》 2025年第1期177-200,共24页
human-robot collaboration(HRC)is set to transform the manufacturing paradigm by leveraging the strengths of human flexibility and robot precision.The recent breakthrough of Large Language Models(LLMs)and Vision-Langua... human-robot collaboration(HRC)is set to transform the manufacturing paradigm by leveraging the strengths of human flexibility and robot precision.The recent breakthrough of Large Language Models(LLMs)and Vision-Language Models(VLMs)has motivated the preliminary explorations and adoptions of these models in the smart manufacturing field.However,despite the considerable amount of effort,existing research mainly focused on individual components without a comprehensive perspective to address the full potential of VLMs,especially for HRC in smart manufacturing scenarios.To fill the gap,this work offers a systematic review of the latest advance-ments and applications of VLMs in HRC for smart manu-facturing,which covers the fundamental architectures and pretraining methodologies of LLMs and VLMs,their applications in robotic task planning,navigation,and manipulation,and role in enhancing human-robot skill transfer through multimodal data integration.Lastly,the paper discusses current limitations and future research directions in VLM-based HRC,highlighting the trend in fully realizing the potential of these technologies for smart manufacturing. 展开更多
关键词 vision-language models large language models human-robot collaboration smart manufacturing
原文传递
The Synergy of Seeing and Saying: Revolutionary Advances in Multi-modality Medical Vision-Language Large Models
3
作者 Xiang LI Yu SUN +3 位作者 Jia LIN Like LI Ting FENG Shen YIN 《Artificial Intelligence Science and Engineering》 2025年第2期79-97,共19页
The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can si... The application of visual-language large models in the field of medical health has gradually become a research focus.The models combine the capability for image understanding and natural language processing,and can simultaneously process multi-modality data such as medical images and medical reports.These models can not only recognize images,but also understand the semantic relationship between images and texts,effectively realize the integration of medical information,and provide strong support for clinical decision-making and disease diagnosis.The visual-language large model has good performance for specific medical tasks,and also shows strong potential and high intelligence in the general task models.This paper provides a comprehensive review of the visual-language large model in the field of medical health.Specifically,this paper first introduces the basic theoretical basis and technical principles.Then,this paper introduces the specific application scenarios in the field of medical health,including modality fusion,semi-supervised learning,weakly supervised learning,unsupervised learning,cross-domain model and general models.Finally,the challenges including insufficient data,interpretability,and practical deployment are discussed.According to the existing challenges,four potential future development directions are given. 展开更多
关键词 large language models vision-language models medical health multimodality models
在线阅读 下载PDF
A Review on Vision-Language-Based Approaches: Challenges and Applications
4
作者 Huu-Tuong Ho Luong Vuong Nguyen +4 位作者 Minh-Tien Pham Quang-Huy Pham Quang-Duong Tran Duong Nguyen Minh Huy Tri-Hai Nguyen 《Computers, Materials & Continua》 2025年第2期1733-1756,共24页
In multimodal learning, Vision-Language Models (VLMs) have become a critical research focus, enabling the integration of textual and visual data. These models have shown significant promise across various natural lang... In multimodal learning, Vision-Language Models (VLMs) have become a critical research focus, enabling the integration of textual and visual data. These models have shown significant promise across various natural language processing tasks, such as visual question answering and computer vision applications, including image captioning and image-text retrieval, highlighting their adaptability for complex, multimodal datasets. In this work, we review the landscape of Bootstrapping Language-Image Pre-training (BLIP) and other VLM techniques. A comparative analysis is conducted to assess VLMs’ strengths, limitations, and applicability across tasks while examining challenges such as scalability, data quality, and fine-tuning complexities. The work concludes by outlining potential future directions in VLM research, focusing on enhancing model interpretability, addressing ethical implications, and advancing multimodal integration in real-world applications. 展开更多
关键词 Bootstrapping language-image pre-training(BLIP) multimodal learning vision-language model(VLM) vision-language pre-training(VLP)
在线阅读 下载PDF
Effectiveness assessment of recent large vision-language models 被引量:1
5
作者 Yao Jiang Xinyu Yan +5 位作者 Ge-Peng Ji Keren Fu Meijun Sun Huan Xiong Deng-Ping Fan Fahad Shahbaz Khan 《Visual Intelligence》 2024年第1期197-213,共17页
The advent of large vision-language models(LVLMs)represents a remarkable advance in the quest for artificial general intelligence.However,the models’effectiveness in both specialized and general tasks warrants furthe... The advent of large vision-language models(LVLMs)represents a remarkable advance in the quest for artificial general intelligence.However,the models’effectiveness in both specialized and general tasks warrants further investigation.This paper endeavors to evaluate the competency of popular LVLMs in specialized and general tasks,respectively,aiming to offer a comprehensive understanding of these novel models.To gauge their effectiveness in specialized tasks,we employ six challenging tasks in three different application scenarios:natural,healthcare,and industrial.These six tasks include salient/camouflaged/transparent object detection,as well as polyp detection,skin lesion detection,and industrial anomaly detection.We examine the performance of three recent open-source LVLMs,including MiniGPT-v2,LLaVA-1.5,and Shikra,on both visual recognition and localization in these tasks.Moreover,we conduct empirical investigations utilizing the aforementioned LVLMs together with GPT-4V,assessing their multi-modal understanding capabilities in general tasks including object counting,absurd question answering,affordance reasoning,attribute recognition,and spatial relation reasoning.Our investigations reveal that these LVLMs demonstrate limited proficiency not only in specialized tasks but also in general tasks.We delve deep into this inadequacy and uncover several potential factors,including limited cognition in specialized tasks,object hallucination,text-to-image interference,and decreased robustness in complex problems.We hope that this study can provide useful insights for the future development of LVLMs,helping researchers improve LVLMs for both general and specialized applications. 展开更多
关键词 Large vision-language models(LVLMs) Recognition LOCALIZATION Multi-modal understanding
在线阅读 下载PDF
IQAGPT:computed tomography image quality assessment with vision-language and ChatGPT models
6
作者 Zhihao Chen Bin Hu +4 位作者 Chuang Niu Tao Chen Yuxin Li Hongming Shan Ge Wang 《Visual Computing for Industry,Biomedicine,and Art》 2024年第1期165-181,共17页
Large language models(LLMs),such as ChatGPT,have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains.Recently,large vision-langua... Large language models(LLMs),such as ChatGPT,have demonstrated impressive capabilities in various tasks and attracted increasing interest as a natural language interface across many domains.Recently,large vision-language models(VLMs)that learn rich vision–language correlation from image–text pairs,like BLIP-2 and GPT-4,have been intensively investigated.However,despite these developments,the application of LLMs and VLMs in image quality assessment(IQA),particularly in medical imaging,remains unexplored.This is valuable for objective performance evaluation and potential supplement or even replacement of radiologists’opinions.To this end,this study intro-duces IQAGPT,an innovative computed tomography(CT)IQA system that integrates image-quality captioning VLM with ChatGPT to generate quality scores and textual reports.First,a CT-IQA dataset comprising 1,000 CT slices with diverse quality levels is professionally annotated and compiled for training and evaluation.To better leverage the capabilities of LLMs,the annotated quality scores are converted into semantically rich text descriptions using a prompt template.Second,the image-quality captioning VLM is fine-tuned on the CT-IQA dataset to generate qual-ity descriptions.The captioning model fuses image and text features through cross-modal attention.Third,based on the quality descriptions,users verbally request ChatGPT to rate image-quality scores or produce radiological qual-ity reports.Results demonstrate the feasibility of assessing image quality using LLMs.The proposed IQAGPT outper-formed GPT-4 and CLIP-IQA,as well as multitask classification and regression models that solely rely on images. 展开更多
关键词 Deep learning Medical imaging Image captioning MULTIMODALITY Large language model vision-language model GPT-4 Subjective evaluation
在线阅读 下载PDF
Video action recognition meets vision-language models exploring human factors in scene interaction: a review
7
作者 GUO Yuping GAO Hongwei +3 位作者 YU Jiahui GE Jinchao HAN Meng JU Zhaojie 《Optoelectronics Letters》 2025年第10期626-640,共15页
Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions... Video action recognition(VAR)aims to analyze dynamic behaviors in videos and achieve semantic understanding.VAR faces challenges such as temporal dynamics,action-scene coupling,and the complexity of human interactions.Existing methods can be categorized into motion-level,event-level,and story-level ones based on spatiotemporal granularity.However,single-modal approaches struggle to capture complex behavioral semantics and human factors.Therefore,in recent years,vision-language models(VLMs)have been introduced into this field,providing new research perspectives for VAR.In this paper,we systematically review spatiotemporal hierarchical methods in VAR and explore how the introduction of large models has advanced the field.Additionally,we propose the concept of“Factor”to identify and integrate key information from both visual and textual modalities,enhancing multimodal alignment.We also summarize various multimodal alignment methods and provide in-depth analysis and insights into future research directions. 展开更多
关键词 human factors video action recognition vision language models analyze dynamic behaviors spatiotemporal granularity video action recognition var aims multimodal alignment scene interaction
原文传递
CLIP-SP:Vision-language model with adaptive prompting for scene parsing
8
作者 Jiaao Li Yixiang Huang +3 位作者 Ming Wu Bin Zhang Xu Ji Chuang Zhang 《Computational Visual Media》 SCIE EI CSCD 2024年第4期741-752,共12页
We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior ... We present a novel framework,CLIPSP,and a novel adaptive prompt method to leverage pre-trained knowledge from CLIP for scene parsing.Our approach addresses the limitations of DenseCLIP,which demonstrates the superior image segmentation provided by CLIP pre-trained models over ImageNet pre-trained models,but struggles with rough pixel-text score maps for complex scene parsing.We argue that,as they contain all textual information in a dataset,the pixel-text score maps,i.e.,dense prompts,are inevitably mixed with noise.To overcome this challenge,we propose a two-step method.Firstly,we extract visual and language features and perform multi-label classification to identify the most likely categories in the input images.Secondly,based on the top-k categories and confidence scores,our method generates scene tokens which can be treated as adaptive prompts for implicit modeling of scenes,and incorporates them into the visual features fed into the decoder for segmentation.Our method imposes a constraint on prompts and suppresses the probability of irrelevant categories appearing in the scene parsing results.Our method achieves competitive performance,limited by the available visual-language pre-trained models.Our CLIP-SP performs 1.14%better(in terms of mIoU)than DenseCLIP on ADE20K,using a ResNet-50 backbone. 展开更多
关键词 visual-language pre-trained model scene parsing adaptive prompt
原文传递
基于Hybrid Model的浙江省太阳总辐射估算及其时空分布特征
9
作者 顾婷婷 潘娅英 张加易 《气象科学》 2025年第2期176-181,共6页
利用浙江省两个辐射站的观测资料,对地表太阳辐射模型Hybrid Model在浙江省的适用性进行评估分析。在此基础上,利用Hybrid Model重建浙江省71个站点1971—2020年的地表太阳辐射日数据集,并分析其时空变化特征。结果表明:Hybrid Model模... 利用浙江省两个辐射站的观测资料,对地表太阳辐射模型Hybrid Model在浙江省的适用性进行评估分析。在此基础上,利用Hybrid Model重建浙江省71个站点1971—2020年的地表太阳辐射日数据集,并分析其时空变化特征。结果表明:Hybrid Model模拟效果良好,和A-P模型计算结果进行对比,杭州站的平均误差、均方根误差、平均绝对百分比误差分别为2.01 MJ·m^(-2)、2.69 MJ·m^(-2)和18.02%,而洪家站的平均误差、均方根误差、平均绝对百分比误差分别为1.41 MJ·m^(-2)、1.85 MJ·m^(-2)和11.56%,误差均低于A-P模型,且Hybrid Model在各月模拟的误差波动较小。浙江省近50 a平均地表总辐射在3733~5060 MJ·m^(-2),高值区主要位于浙北平原及滨海岛屿地区。1971—2020年浙江省太阳总辐射呈明显减少的趋势,气候倾向率为-72 MJ·m^(-2)·(10 a)^(-1),并在1980s初和2000年中期发生了突变减少。 展开更多
关键词 Hybrid model 太阳总辐射 误差分析 时空分布
在线阅读 下载PDF
基于24Model的动火作业事故致因文本挖掘 被引量:1
10
作者 牛茂辉 李威君 +1 位作者 刘音 王璐 《中国安全科学学报》 北大核心 2025年第3期151-158,共8页
为探究工业动火作业事故的根源,提出一种基于“2-4”模型(24Model)的文本挖掘方法。首先,收集整理220篇动火作业事故报告,并作为数据集,构建基于来自变换器的双向编码器表征量(BERT)的24Model分类器,使用预训练模型训练和评估事故报告... 为探究工业动火作业事故的根源,提出一种基于“2-4”模型(24Model)的文本挖掘方法。首先,收集整理220篇动火作业事故报告,并作为数据集,构建基于来自变换器的双向编码器表征量(BERT)的24Model分类器,使用预训练模型训练和评估事故报告数据集,构建分类模型;然后,通过基于BERT的关键字提取算法(KeyBERT)和词频-逆文档频率(TF-IDF)算法的组合权重,结合24Model框架,建立动火作业事故文本关键词指标体系;最后,通过文本挖掘关键词之间的网络共现关系,分析得到事故致因之间的相互关联。结果显示,基于BERT的24Model分类器模型能够系统准确地判定动火作业事故致因类别,通过组合权重筛选得到4个层级关键词指标体系,其中安全管理体系的权重最大,结合共现网络分析得到动火作业事故的7项关键致因。 展开更多
关键词 “2-4”模型(24model) 动火作业 事故致因 文本挖掘 指标体系
原文传递
VLCA: vision-language aligning model with cross-modal attention for bilingual remote sensing image captioning 被引量:3
11
作者 WEI Tingting YUAN Weilin +2 位作者 LUO Junren ZHANG Wanpeng LU Lina 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第1期9-18,共10页
In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a visi... In the field of satellite imagery, remote sensing image captioning(RSIC) is a hot topic with the challenge of overfitting and difficulty of image and text alignment. To address these issues, this paper proposes a vision-language aligning paradigm for RSIC to jointly represent vision and language. First, a new RSIC dataset DIOR-Captions is built for augmenting object detection in optical remote(DIOR) sensing images dataset with manually annotated Chinese and English contents. Second, a Vision-Language aligning model with Cross-modal Attention(VLCA) is presented to generate accurate and abundant bilingual descriptions for remote sensing images. Third, a crossmodal learning network is introduced to address the problem of visual-lingual alignment. Notably, VLCA is also applied to end-toend Chinese captions generation by using the pre-training language model of Chinese. The experiments are carried out with various baselines to validate VLCA on the proposed dataset. The results demonstrate that the proposed algorithm is more descriptive and informative than existing algorithms in producing captions. 展开更多
关键词 remote sensing image captioning(RSIC) vision-language representation remote sensing image caption dataset attention mechanism
在线阅读 下载PDF
Prognostic model for esophagogastric variceal rebleeding after endoscopic treatment in liver cirrhosis: A Chinese multicenter study 被引量:2
12
作者 Jun-Yi Zhan Jie Chen +7 位作者 Jin-Zhong Yu Fei-Peng Xu Fei-Fei Xing De-Xin Wang Ming-Yan Yang Feng Xing Jian Wang Yong-Ping Mu 《World Journal of Gastroenterology》 SCIE CAS 2025年第2期85-101,共17页
BACKGROUND Rebleeding after recovery from esophagogastric variceal bleeding(EGVB)is a severe complication that is associated with high rates of both incidence and mortality.Despite its clinical importance,recognized p... BACKGROUND Rebleeding after recovery from esophagogastric variceal bleeding(EGVB)is a severe complication that is associated with high rates of both incidence and mortality.Despite its clinical importance,recognized prognostic models that can effectively predict esophagogastric variceal rebleeding in patients with liver cirrhosis are lacking.AIM To construct and externally validate a reliable prognostic model for predicting the occurrence of esophagogastric variceal rebleeding.METHODS This study included 477 EGVB patients across 2 cohorts:The derivation cohort(n=322)and the validation cohort(n=155).The primary outcome was rebleeding events within 1 year.The least absolute shrinkage and selection operator was applied for predictor selection,and multivariate Cox regression analysis was used to construct the prognostic model.Internal validation was performed with bootstrap resampling.We assessed the discrimination,calibration and accuracy of the model,and performed patient risk stratification.RESULTS Six predictors,including albumin and aspartate aminotransferase concentrations,white blood cell count,and the presence of ascites,portal vein thrombosis,and bleeding signs,were selected for the rebleeding event prediction following endoscopic treatment(REPET)model.In predicting rebleeding within 1 year,the REPET model ex-hibited a concordance index of 0.775 and a Brier score of 0.143 in the derivation cohort,alongside 0.862 and 0.127 in the validation cohort.Furthermore,the REPET model revealed a significant difference in rebleeding rates(P<0.01)between low-risk patients and intermediate-to high-risk patients in both cohorts.CONCLUSION We constructed and validated a new prognostic model for variceal rebleeding with excellent predictive per-formance,which will improve the clinical management of rebleeding in EGVB patients. 展开更多
关键词 Esophagogastric variceal bleeding Variceal rebleeding Liver cirrhosis Prognostic model Risk stratification Secondary prophylaxis
暂未订购
Landslide Susceptibility Mapping Using RBFN-Based Ensemble Machine Learning Models 被引量:1
13
作者 Duc-Dam Nguyen Nguyen Viet Tiep +5 位作者 Quynh-Anh Thi Bui Hiep Van Le Indra Prakash Romulus Costache Manish Pandey Binh Thai Pham 《Computer Modeling in Engineering & Sciences》 SCIE EI 2025年第1期467-500,共34页
This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble lear... This study was aimed to prepare landslide susceptibility maps for the Pithoragarh district in Uttarakhand,India,using advanced ensemble models that combined Radial Basis Function Networks(RBFN)with three ensemble learning techniques:DAGGING(DG),MULTIBOOST(MB),and ADABOOST(AB).This combination resulted in three distinct ensemble models:DG-RBFN,MB-RBFN,and AB-RBFN.Additionally,a traditional weighted method,Information Value(IV),and a benchmark machine learning(ML)model,Multilayer Perceptron Neural Network(MLP),were employed for comparison and validation.The models were developed using ten landslide conditioning factors,which included slope,aspect,elevation,curvature,land cover,geomorphology,overburden depth,lithology,distance to rivers and distance to roads.These factors were instrumental in predicting the output variable,which was the probability of landslide occurrence.Statistical analysis of the models’performance indicated that the DG-RBFN model,with an Area Under ROC Curve(AUC)of 0.931,outperformed the other models.The AB-RBFN model achieved an AUC of 0.929,the MB-RBFN model had an AUC of 0.913,and the MLP model recorded an AUC of 0.926.These results suggest that the advanced ensemble ML model DG-RBFN was more accurate than traditional statistical model,single MLP model,and other ensemble models in preparing trustworthy landslide susceptibility maps,thereby enhancing land use planning and decision-making. 展开更多
关键词 Landslide susceptibility map spatial analysis ensemble modelling information values(IV)
在线阅读 下载PDF
An integrated method of data-driven and mechanism models for formation evaluation with logs 被引量:1
14
作者 Meng-Lu Kang Jun Zhou +4 位作者 Juan Zhang Li-Zhi Xiao Guang-Zhi Liao Rong-Bo Shao Gang Luo 《Petroleum Science》 2025年第3期1110-1124,共15页
We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpr... We propose an integrated method of data-driven and mechanism models for well logging formation evaluation,explicitly focusing on predicting reservoir parameters,such as porosity and water saturation.Accurately interpreting these parameters is crucial for effectively exploring and developing oil and gas.However,with the increasing complexity of geological conditions in this industry,there is a growing demand for improved accuracy in reservoir parameter prediction,leading to higher costs associated with manual interpretation.The conventional logging interpretation methods rely on empirical relationships between logging data and reservoir parameters,which suffer from low interpretation efficiency,intense subjectivity,and suitability for ideal conditions.The application of artificial intelligence in the interpretation of logging data provides a new solution to the problems existing in traditional methods.It is expected to improve the accuracy and efficiency of the interpretation.If large and high-quality datasets exist,data-driven models can reveal relationships of arbitrary complexity.Nevertheless,constructing sufficiently large logging datasets with reliable labels remains challenging,making it difficult to apply data-driven models effectively in logging data interpretation.Furthermore,data-driven models often act as“black boxes”without explaining their predictions or ensuring compliance with primary physical constraints.This paper proposes a machine learning method with strong physical constraints by integrating mechanism and data-driven models.Prior knowledge of logging data interpretation is embedded into machine learning regarding network structure,loss function,and optimization algorithm.We employ the Physically Informed Auto-Encoder(PIAE)to predict porosity and water saturation,which can be trained without labeled reservoir parameters using self-supervised learning techniques.This approach effectively achieves automated interpretation and facilitates generalization across diverse datasets. 展开更多
关键词 Well log Reservoir evaluation Label scarcity Mechanism model Data-driven model Physically informed model Self-supervised learning Machine learning
原文传递
Predictability Study of Weather and Climate Events Related to Artificial Intelligence Models 被引量:2
15
作者 Mu MU Bo QIN Guokun DAI 《Advances in Atmospheric Sciences》 2025年第1期1-8,共8页
Conducting predictability studies is essential for tracing the source of forecast errors,which not only leads to the improvement of observation and forecasting systems,but also enhances the understanding of weather an... Conducting predictability studies is essential for tracing the source of forecast errors,which not only leads to the improvement of observation and forecasting systems,but also enhances the understanding of weather and climate phenomena.In the past few decades,dynamical numerical models have been the primary tools for predictability studies,achieving significant progress.Nowadays,with the advances in artificial intelligence(AI)techniques and accumulations of vast meteorological data,modeling weather and climate events using modern data-driven approaches is becoming trendy,where FourCastNet,Pangu-Weather,and GraphCast are successful pioneers.In this perspective article,we suggest AI models should not be limited to forecasting but be expanded to predictability studies,leveraging AI's advantages of high efficiency and self-contained optimization modules.To this end,we first remark that AI models should possess high simulation capability with fine spatiotemporal resolution for two kinds of predictability studies.AI models with high simulation capabilities comparable to numerical models can be considered to provide solutions to partial differential equations in a data-driven way.Then,we highlight several specific predictability issues with well-determined nonlinear optimization formulizations,which can be well-studied using AI models,holding significant scientific value.In addition,we advocate for the incorporation of AI models into the synergistic cycle of the cognition–observation–model paradigm.Comprehensive predictability studies have the potential to transform“big data”to“big and better data”and shift the focus from“AI for forecasts”to“AI for science”,ultimately advancing the development of the atmospheric and oceanic sciences. 展开更多
关键词 PREDICTABILITY artificial intelligence models simulation and forecasting nonlinear optimization cognition–observation–model paradigm
在线阅读 下载PDF
Sensorless battery expansion estimation using electromechanical coupled models and machine learning 被引量:1
16
作者 Xue Cai Caiping Zhang +4 位作者 Jue Chen Zeping Chen Linjing Zhang Dirk Uwe Sauer Weihan Li 《Journal of Energy Chemistry》 2025年第6期142-157,I0004,共17页
Developing sensorless techniques for estimating battery expansion is essential for effective mechanical state monitoring,improving the accuracy of digital twin simulation and abnormality detection.Therefore,this paper... Developing sensorless techniques for estimating battery expansion is essential for effective mechanical state monitoring,improving the accuracy of digital twin simulation and abnormality detection.Therefore,this paper presents a data-driven approach to expansion estimation using electromechanical coupled models with machine learning.The proposed method integrates reduced-order impedance models with data-driven mechanical models,coupling the electrochemical and mechanical states through the state of charge(SOC)and mechanical pressure within a state estimation framework.The coupling relationship was established through experimental insights into pressure-related impedance parameters and the nonlinear mechanical behavior with SOC and pressure.The data-driven model was interpreted by introducing a novel swelling coefficient defined by component stiffnesses to capture the nonlinear mechanical behavior across various mechanical constraints.Sensitivity analysis of the impedance model shows that updating model parameters with pressure can reduce the mean absolute error of simulated voltage by 20 mV and SOC estimation error by 2%.The results demonstrate the model's estimation capabilities,achieving a root mean square error of less than 1 kPa when the maximum expansion force is from 30 kPa to 120 kPa,outperforming calibrated stiffness models and other machine learning techniques.The model's robustness and generalizability are further supported by its effective handling of SOC estimation and pressure measurement errors.This work highlights the importance of the proposed framework in enhancing state estimation and fault diagnosis for lithium-ion batteries. 展开更多
关键词 Sensorless estimation Electromechanical coupling Impedance model Data-driven model Mechanical pressure
在线阅读 下载PDF
A Multi-Level Semantic Constraint Approach for Highway Tunnel Scene Twin Modeling 被引量:1
17
作者 LI Yufei XIE Yakun +3 位作者 CHEN Mingzhen ZHAO Yaoji TU Jiaxing HU Ya 《Journal of Geodesy and Geoinformation Science》 2025年第2期37-56,共20页
As a key node of modern transportation network,the informationization management of road tunnels is crucial to ensure the operation safety and traffic efficiency.However,the existing tunnel vehicle modeling methods ge... As a key node of modern transportation network,the informationization management of road tunnels is crucial to ensure the operation safety and traffic efficiency.However,the existing tunnel vehicle modeling methods generally have problems such as insufficient 3D scene description capability and low dynamic update efficiency,which are difficult to meet the demand of real-time accurate management.For this reason,this paper proposes a vehicle twin modeling method for road tunnels.This approach starts from the actual management needs,and supports multi-level dynamic modeling from vehicle type,size to color by constructing a vehicle model library that can be flexibly invoked;at the same time,semantic constraint rules with geometric layout,behavioral attributes,and spatial relationships are designed to ensure that the virtual model matches with the real model with a high degree of similarity;ultimately,the prototype system is constructed and the case region is selected for the case study,and the dynamic vehicle status in the tunnel is realized by integrating real-time monitoring data with semantic constraints for precise virtual-real mapping.Finally,the prototype system is constructed and case experiments are conducted in selected case areas,which are combined with real-time monitoring data to realize dynamic updating and three-dimensional visualization of vehicle states in tunnels.The experiments show that the proposed method can run smoothly with an average rendering efficiency of 17.70 ms while guaranteeing the modeling accuracy(composite similarity of 0.867),which significantly improves the real-time and intuitive tunnel management.The research results provide reliable technical support for intelligent operation and emergency response of road tunnels,and offer new ideas for digital twin modeling of complex scenes. 展开更多
关键词 highway tunnel twin modeling multi-level semantic constraints tunnel vehicles multidimensional modeling
在线阅读 下载PDF
Large language models for robotics:Opportunities,challenges,and perspectives 被引量:3
18
作者 Jiaqi Wang Enze Shi +7 位作者 Huawen Hu Chong Ma Yiheng Liu Xuhui Wang Yincheng Yao Xuan Liu Bao Ge Shu Zhang 《Journal of Automation and Intelligence》 2025年第1期52-64,共13页
Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and langua... Large language models(LLMs)have undergone significant expansion and have been increasingly integrated across various domains.Notably,in the realm of robot task planning,LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.However,for embodied tasks,where robots interact with complex environments,textonly LLMs often face challenges due to a lack of compatibility with robotic visual perception.This study provides a comprehensive overview of the emerging integration of LLMs and multimodal LLMs into various robotic tasks.Additionally,we propose a framework that utilizes multimodal GPT-4V to enhance embodied task planning through the combination of natural language instructions and robot visual perceptions.Our results,based on diverse datasets,indicate that GPT-4V effectively enhances robot performance in embodied tasks.This extensive survey and evaluation of LLMs and multimodal LLMs across a variety of robotic tasks enriches the understanding of LLM-centric embodied intelligence and provides forward-looking insights towards bridging the gap in Human-Robot-Environment interaction. 展开更多
关键词 Large language models ROBOTICS Generative AI Embodied intelligence
在线阅读 下载PDF
Comparative study on the oblique water-entry of high-speed projectile based on rigid-body and elastic-plastic body model 被引量:1
19
作者 Xiangyan Liu Xiaowei Cai +3 位作者 Zhengui Huang Yu Hou Jian Qin Zhihua Chen 《Defence Technology(防务技术)》 2025年第4期133-155,共23页
To examine the similarities and differences in the evolution of cavity,wetting and dynamics of a highspeed,oblique water-entry projectile with different positive angles of attack,a comparative analysis has been conduc... To examine the similarities and differences in the evolution of cavity,wetting and dynamics of a highspeed,oblique water-entry projectile with different positive angles of attack,a comparative analysis has been conducted based on the numerical results of two mathematical models,the rigid-body model and fluid-structure interaction model.In addition,the applicable scope of the above two methods,and the structural response characteristics of the projectile have also been investigated.Our results demonstrate that:(1) The impact loads and angular motion of the projectile of the rigid-body method are more likely to exhibit periodic variations due to the periodic tail slap,its range of positive angles of attack is about α<2°.(2) When the projectile undergone significant wetting,a strong coupling effect is observed among wetting,structural deformation,and projectile motion.With the applied projectile shape,it is observed that,when the projectile bends,the final wetting position is that of Part B(cylinder of body).With the occu rrence of this phenomenon,the projectile ballistics beco me completely unstable.(3) The force exerted on the lower surface of the projectile induced by wetting is the primary reason of the destabilization of the projectile traj ectory and structu ral deformation failure.Bending deformation is most likely to appear at the junction of Part C(cone of body) and Part D(tail).The safe angles of attack of the projectile stability are found to be about α≤2°. 展开更多
关键词 Fluid-structure interaction Rigid-body model Elastic-plastic model Structural deformation Impact loads Structural safety of projectile
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部