With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intellig...With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.展开更多
Calvin Wee first set foot in China at the age of 14—not for a diplomatic forum or a youth initiative,but on a week-long school trip to Guangzhou.Yet even then,the experience left a lasting impression.“We visited a l...Calvin Wee first set foot in China at the age of 14—not for a diplomatic forum or a youth initiative,but on a week-long school trip to Guangzhou.Yet even then,the experience left a lasting impression.“We visited a local school and sat in on several classes,”he recalled.“The math lessons stood out—the students’intensity and focus were striking.In Singapore,we study hard,but this was on another level.”That early glimpse into China’s education system stayed with him,and years later,it resurfaced as he began to engage more deeply with the Chinese language and culture.A pocket-sized wuxia xiaoshuo(martial arts novel),picked up on a whim in a Guangzhou bookstore,opened the door to martial arts fiction—and sparked a lasting interest in Chinese literature and,eventually,China Studies.展开更多
Iince rising to fame in 1987 on the television drama Sunshine After Rain,Singaporean artist Edmund Chen has starred in many films and TV series.He has also been active in theatre,hosting programs,and-more recently-pub...Iince rising to fame in 1987 on the television drama Sunshine After Rain,Singaporean artist Edmund Chen has starred in many films and TV series.He has also been active in theatre,hosting programs,and-more recently-publishing.Chen has authored 15 picture books and two art activity books for children,earning recognition as Southeast Asia’s Best Emerging Children’s Author.Today,he remains a prominent figure in public,not just as an actor but also as a writer and illustrator.展开更多
Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing...Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities.展开更多
This study examines the application of the Understanding by Design(UbD)approach to enhance students’cognitive,affective,and psychomotor learning domains,as well as their intercultural communication competence,in the ...This study examines the application of the Understanding by Design(UbD)approach to enhance students’cognitive,affective,and psychomotor learning domains,as well as their intercultural communication competence,in the Introduction to Chinese Culture course.UbD,a curriculum design framework emphasizing deep understanding over rote memorization,employs a“backward design”process to help students achieve a profound comprehension of Chinese culture and its modern implications.Through this approach,students also develop critical intercultural communication skills.The study offers helpful strategies for integrating English language teaching with Chinese cultural education,providing practical insights for curriculum development that bridges linguistic and cultural learning.展开更多
Based on the ADDIE model,we explored the integration path between Understanding Contemporary China:An English Reading and Writing Textbook and the main textbook of college English reading.Firstly,the basis of textbook...Based on the ADDIE model,we explored the integration path between Understanding Contemporary China:An English Reading and Writing Textbook and the main textbook of college English reading.Firstly,the basis of textbook integration was analyzed,followed by an explanation of the overall design scheme of textbook integration,and then a series of teaching resources were initially developed and used in teaching practice.Through testing,it was found that the integrative teaching of the two sets of textbooks under the ADDIE model achieved good teaching results,and promoted the three aspects of reading literacy,disciplinary literacy,and value leadership.展开更多
This is a research report on the interrelationship among understanding, memory and oral expression in listening comprehension training. The research investigated 150 non-English majors through a questionnaire in a hig...This is a research report on the interrelationship among understanding, memory and oral expression in listening comprehension training. The research investigated 150 non-English majors through a questionnaire in a higher vocational college. The result indicates that the interrelationship among understanding, memory and oral expression in listening classes which are determined by the listening materials and teaching method directly influences students' listening proficiency and speaking ability. It is suggested that some strategies to be used to foster students' abilities in these three areas so as to improve their listening ability.展开更多
The author attempts to interpret the theme of A Passsage to India by analyzing the major events and characterization. It is indicated in the thesis that E.M.Forster highlights the failure of aspired understanding ...The author attempts to interpret the theme of A Passsage to India by analyzing the major events and characterization. It is indicated in the thesis that E.M.Forster highlights the failure of aspired understanding between different races, different people and even within a person. In other words, in many cases people have aspirations for understanding, but in reality the real understanding is very difficult to fulfill.展开更多
针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,...针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建一个多目标视觉定位数据集Mmo3DRefer,并设计跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。与CORE-3DVG(Contextual Objects and RElations for 3D Visual Grounding)、3DVG-Transformer和Multi3DRefer(Multiple 3D object Referencing dataset and task)等5种方法对比的实验结果表明,与次优方法Multi3DRefer相比,TextVizNet在Mmo3DRefer数据集上的F1-score、精确度和召回率分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶和智能机器人等实际应用提供了有效支持。展开更多
Large-scale proteomics studies can refine our understanding of health and disease and enable precision medicine.Here,we provide a detailed atlas of 2,920 plasma proteins linking to diseases(406 prevalent and 660 incid...Large-scale proteomics studies can refine our understanding of health and disease and enable precision medicine.Here,we provide a detailed atlas of 2,920 plasma proteins linking to diseases(406 prevalent and 660 incident)and 986 health-related traits in 53,026 individuals(median follow-up:14.8 years)from the UK Biobank,representing the most comprehensive proteome profiles to date.This atlas revealed 168,100 protein-disease associations and 554,488 protein-trait associations.展开更多
In physics,our expectations for system behavior are often guided by intuitive arithmetic.For systems composed of identical units,we anticipate synergy of the contributions from these units,where 1+1=2.Conversely,for s...In physics,our expectations for system behavior are often guided by intuitive arithmetic.For systems composed of identical units,we anticipate synergy of the contributions from these units,where 1+1=2.Conversely,for systems built from opposing units,we expect cancellation of their contributions,where 1-1=0.This intuitive arithmetic has long underpinned our understanding of physical properties of materials,from electronic transport to optical responses.However,scientific breakthroughs often occur when nature reveals ways to circumvent these seemingly fundamental rules,opening new possibilities that challenge our deepest assumptions about material behavior.展开更多
End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods th...End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods that perform full fine-tuning of pretrained video models often incur substantial computational costs,which become particularly pronounced when processing long video sequences.Moreover,the need for precise temporal boundary annotations makes data labeling extremely expensive.In low-resource settings where annotated samples are scarce,direct fine-tuning tends to cause overfitting.To address these challenges,we introduce Dynamic LowRank Adapter(DyLoRA),a lightweight fine-tuning framework tailored specifically for the TAD task.Built upon the Low-Rank Adaptation(LoRA)architecture,DyLoRA adapts only the key layers of the pretrained model via low-rank decomposition,reducing the number of trainable parameters to less than 5%of full fine-tuning methods.This significantly lowers memory consumption and mitigates overfitting in low-resource settings.Notably,DyLoRA enhances the temporal modeling capability of pretrained models by optimizing temporal dimension weights,thereby alleviating the representation misalignment of temporal features.Experimental results demonstrate that DyLoRA-TAD achieves impressive performance,with 73.9%mAP on THUMOS14,39.52%on ActivityNet-1.3,and 28.2%on Charades,substantially surpassing the best traditional feature-based methods.展开更多
The Qinghai-Xizang Plateau,known as the Roof of the World and the Water Tower of Asia,is recognized as the Earth’s Third Pole.It functions as a vital ecological security barrier and a strategic resource reserve for C...The Qinghai-Xizang Plateau,known as the Roof of the World and the Water Tower of Asia,is recognized as the Earth’s Third Pole.It functions as a vital ecological security barrier and a strategic resource reserve for China,while also serving as an important conservation area that reflects the unique culture of the Chinese nation.Conducting the Second Comprehensive Scientific Expedition to the Qinghai-Xizang Plateau is essential for understanding valuable insights into scientific protection of the region.展开更多
In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although mu...In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although multiple instance learning(MIL)has dominated the WSVAD for a long time,its reliance solely on video-level labels without semantic grounding hinders a fine-grained understanding of visually similar yet semantically distinct events.In addition,insufficient temporal modeling obscures causal relationships between events,making anomaly decisions reactive rather than reasoning-based.To overcome the limitations above,this paper proposes an adaptive knowledgebased guidance method that integrates external structured knowledge.The approach combines hierarchical category information with learnable prompt vectors.It then constructs continuously updated contextual references within the feature space,enabling fine-grained meaning-based guidance over video content.Building on this,the work introduces an event relation analysis module.This module explicitly models temporal dependencies and causal correlations between video snippets.It constructs an evolving logic chain of anomalous events,revealing the process by which isolated anomalous snippets develop into a complete event.Experiments on multiple benchmark datasets show that the proposed method achieves highly competitive performance,achieving an AUC of 88.19%on UCF-Crime and an AP of 86.49%on XD-Violence.More importantly,the method provides temporal and causal explanations derived from event relationships alongside its detection results.This capability significantly advances WSVAD from a simple binary classification to a new level of interpretable behavior analysis.展开更多
Understanding the complex plasma dynamics in ultra-intense relativistic laser-solid interactions is of fundamental importance for applications of laser-plasma-based particle accelerators,the creation of high-energy-de...Understanding the complex plasma dynamics in ultra-intense relativistic laser-solid interactions is of fundamental importance for applications of laser-plasma-based particle accelerators,the creation of high-energy-density matter,understanding planetary science,and laser-driven fusion energy.However,experimental efforts in this regime have been limited by the lack of accessibility of over-critical densities and the poor spatiotemporal resolution of conventional diagnostics.Over the last decade,the advent of femtosecond brilliant hard X-ray free-electron lasers(XFELs)has opened new horizons to overcome these limitations.Here,for the first time,we present full-scale spatiotemporal measurements of solid-density plasma dynamics,including preplasma generation with tens of nanometer scale length driven by the leading edge of a relativistic laser pulse,ultrafast heating and ionization at the main pulse arrival,the laser-driven blast wave,and transient surface return current-induced compression dynamics up to hundreds of picoseconds after interaction.These observations are enabled by utilizing a novel combination of advanced X-ray diagnostics including small-angle X-ray scattering,resonant X-ray emission spectroscopy,and propagation-based X-ray phase-contrast imaging simultaneously at the European XFEL-HED beamline station.展开更多
Experiments with interacting high-velocity flows in a laser plasma can help answer fundamental questions in plasma physics and improve understanding of the mechanisms behind some astrophysical phenomena,such as the fo...Experiments with interacting high-velocity flows in a laser plasma can help answer fundamental questions in plasma physics and improve understanding of the mechanisms behind some astrophysical phenomena,such as the formation of collisionless shock waves,deceleration of accretion flows,and evolution of solar and stellar flares.This work presents the first direct experimental observations of stagnation and redirection of counterstreaming flows(jets)of laser plasma induced by intense laser pulses with intensity I~2×10^(18) W/cm^(2).Hybrid particlein-cell-fluid modeling,which takes into account the kinetic effects of ion motion and the evolution of the pressure tensor for electrons,demonstrates the compression of counterdirected toroidal self-generated magnetic fields embedded in counterstreaming plasma flows.The enhancement of the toroidal magnetic field in the interaction region results in plasma flow stagnation and redirection of the jets across the line of their initial propagation.展开更多
基金supported by the National Natural Science Foundation of China(Nos.62371323,62401380,U2433217,U2333209,and U20A20161)Natural Science Foundation of Sichuan Province,China(Nos.2025ZNSFSC1476)+2 种基金Sichuan Science and Technology Program,China(Nos.2024YFG0010 and 2024ZDZX0046)the Institutional Research Fund from Sichuan University(Nos.2024SCUQJTX030)the Open Fund of Key Laboratory of Flight Techniques and Flight Safety,CAAC(Nos.GY2024-01A).
文摘With the advent of the next-generation Air Traffic Control(ATC)system,there is growing interest in using Artificial Intelligence(AI)techniques to enhance Situation Awareness(SA)for ATC Controllers(ATCOs),i.e.,Intelligent SA(ISA).However,the existing AI-based SA approaches often rely on unimodal data and lack a comprehensive description and benchmark of the ISA tasks utilizing multi-modal data for real-time ATC environments.To address this gap,by analyzing the situation awareness procedure of the ATCOs,the ISA task is refined to the processing of the two primary elements,i.e.,spoken instructions and flight trajectories.Subsequently,the ISA is further formulated into Controlling Intent Understanding(CIU)and Flight Trajectory Prediction(FTP)tasks.For the CIU task,an innovative automatic speech recognition and understanding framework is designed to extract the controlling intent from unstructured and continuous ATC communications.For the FTP task,the single-and multi-horizon FTP approaches are investigated to support the high-precision prediction of the situation evolution.A total of 32 unimodal/multi-modal advanced methods with extensive evaluation metrics are introduced to conduct the benchmarks on the real-world multi-modal ATC situation dataset.Experimental results demonstrate the effectiveness of AI-based techniques in enhancing ISA for the ATC environment.
文摘Calvin Wee first set foot in China at the age of 14—not for a diplomatic forum or a youth initiative,but on a week-long school trip to Guangzhou.Yet even then,the experience left a lasting impression.“We visited a local school and sat in on several classes,”he recalled.“The math lessons stood out—the students’intensity and focus were striking.In Singapore,we study hard,but this was on another level.”That early glimpse into China’s education system stayed with him,and years later,it resurfaced as he began to engage more deeply with the Chinese language and culture.A pocket-sized wuxia xiaoshuo(martial arts novel),picked up on a whim in a Guangzhou bookstore,opened the door to martial arts fiction—and sparked a lasting interest in Chinese literature and,eventually,China Studies.
文摘Iince rising to fame in 1987 on the television drama Sunshine After Rain,Singaporean artist Edmund Chen has starred in many films and TV series.He has also been active in theatre,hosting programs,and-more recently-publishing.Chen has authored 15 picture books and two art activity books for children,earning recognition as Southeast Asia’s Best Emerging Children’s Author.Today,he remains a prominent figure in public,not just as an actor but also as a writer and illustrator.
基金supported by a grant from the Ministry of Research,Innovation and Digitization,CNCS/CCCDI-UEFISCDI,project number COFUND-CETP-SMART-LEM-1,within PNCDI Ⅳ.
文摘Extracting data from visually rich documents and charts using traditional methods that rely on OCR-based parsing poses multiple challenges,including layout complexity in unstructured formats,limitations in recognizing visual elements,and the correlation between different parts of the documents,as well as domain-specific semantics.Simply extracting text is not sufficient;advanced reasoning capabilities are proving to be essential to analyze content and answer questions accurately.This paper aims to evaluate the ability of the Large Language Models(LLMs)to correctly answer questions about various types of charts,comparing their performance when using images as input versus directly parsing PDF files.To retrieve the images from the PDF,ColPali,a model leveraging state-of-the-art visual languagemodels,is used to identify the relevant page containing the appropriate chart for each question.Google’s Gemini multimodal models were used to answer a set of questions through two approaches:1)processing images derived from PDF documents and 2)directly utilizing the content of the same PDFs.Our findings underscore the limitations of traditional OCR-based approaches in visual document understanding(VrDU)and demonstrate the advantages of multimodal methods in both data extraction and reasoning tasks.Through structured benchmarking of chart question answering(CQA)across input formats,our work contributes to the advancement of chart understanding(CU)and the broader field of multimodal document analysis.Using two diverse and information-rich sources:the World Health Statistics 2024 report by theWorld Health Organisation and the Global Banking Annual Review 2024 by McKinsey&Company,we examine the performance ofmultimodal LLMs across different input modalities,comparing their effectiveness in processing charts as images versus parsing directly from PDF content.These documents were selected due to their multimodal nature,combining dense textual analysis with varied visual representations,thus presenting realistic challenges for vision-language models.This comparison is aimed at assessing how advanced models perform with different input formats and to determine if an image-based approach enhances chart comprehension in terms of accurate data extraction and reasoning capabilities.
基金The 2022 Guangdong Provincial Higher Education Teaching Quality and Reform Project“Research and Practice of English Teaching Integrating Ideological and Political Education into the Introduction to Chinese Culture Course Based on UbD Theory”。
文摘This study examines the application of the Understanding by Design(UbD)approach to enhance students’cognitive,affective,and psychomotor learning domains,as well as their intercultural communication competence,in the Introduction to Chinese Culture course.UbD,a curriculum design framework emphasizing deep understanding over rote memorization,employs a“backward design”process to help students achieve a profound comprehension of Chinese culture and its modern implications.Through this approach,students also develop critical intercultural communication skills.The study offers helpful strategies for integrating English language teaching with Chinese cultural education,providing practical insights for curriculum development that bridges linguistic and cultural learning.
基金funded by Project Information:2022 Guangzhou Municipal Higher Education Teaching Research and Reform Key Research Project:“Innovation and Practice of English Reading Curriculum Ideological and Political in Colleges and Universities:A Study on the Integration of Understanding Contemporary China:An English Reading and Writing Textbook and English Reading Teaching”(2023JGZDXM001).
文摘Based on the ADDIE model,we explored the integration path between Understanding Contemporary China:An English Reading and Writing Textbook and the main textbook of college English reading.Firstly,the basis of textbook integration was analyzed,followed by an explanation of the overall design scheme of textbook integration,and then a series of teaching resources were initially developed and used in teaching practice.Through testing,it was found that the integrative teaching of the two sets of textbooks under the ADDIE model achieved good teaching results,and promoted the three aspects of reading literacy,disciplinary literacy,and value leadership.
文摘This is a research report on the interrelationship among understanding, memory and oral expression in listening comprehension training. The research investigated 150 non-English majors through a questionnaire in a higher vocational college. The result indicates that the interrelationship among understanding, memory and oral expression in listening classes which are determined by the listening materials and teaching method directly influences students' listening proficiency and speaking ability. It is suggested that some strategies to be used to foster students' abilities in these three areas so as to improve their listening ability.
文摘The author attempts to interpret the theme of A Passsage to India by analyzing the major events and characterization. It is indicated in the thesis that E.M.Forster highlights the failure of aspired understanding between different races, different people and even within a person. In other words, in many cases people have aspirations for understanding, but in reality the real understanding is very difficult to fulfill.
文摘针对现有的三维视觉定位方法依赖昂贵传感器设备、系统成本高且在复杂多目标定位中准确度和鲁棒性不足的问题,提出一种基于单目图像的多目标三维视觉定位方法。该方法结合自然语言描述,在单个RGB图像中实现对多个三维目标的识别。为此,构建一个多目标视觉定位数据集Mmo3DRefer,并设计跨模态匹配网络TextVizNet。TextVizNet通过预训练的单目检测器生成目标的三维边界框,并借助信息融合模块与信息对齐模块实现视觉与语言信息的深度整合,进而实现文本指导下的多目标三维检测。与CORE-3DVG(Contextual Objects and RElations for 3D Visual Grounding)、3DVG-Transformer和Multi3DRefer(Multiple 3D object Referencing dataset and task)等5种方法对比的实验结果表明,与次优方法Multi3DRefer相比,TextVizNet在Mmo3DRefer数据集上的F1-score、精确度和召回率分别提升了8.92%、8.39%和9.57%,显著提升了复杂场景下基于文本的多目标定位精度,为自动驾驶和智能机器人等实际应用提供了有效支持。
文摘Large-scale proteomics studies can refine our understanding of health and disease and enable precision medicine.Here,we provide a detailed atlas of 2,920 plasma proteins linking to diseases(406 prevalent and 660 incident)and 986 health-related traits in 53,026 individuals(median follow-up:14.8 years)from the UK Biobank,representing the most comprehensive proteome profiles to date.This atlas revealed 168,100 protein-disease associations and 554,488 protein-trait associations.
基金supported by the National Natural Science Foundation of China (Grant No.12374109)the National Key Research and Development Program of China (Grant No.2023YFA1406600)。
文摘In physics,our expectations for system behavior are often guided by intuitive arithmetic.For systems composed of identical units,we anticipate synergy of the contributions from these units,where 1+1=2.Conversely,for systems built from opposing units,we expect cancellation of their contributions,where 1-1=0.This intuitive arithmetic has long underpinned our understanding of physical properties of materials,from electronic transport to optical responses.However,scientific breakthroughs often occur when nature reveals ways to circumvent these seemingly fundamental rules,opening new possibilities that challenge our deepest assumptions about material behavior.
基金supported by the National Natural Science Foundation of China(Grant No.62266054)the Major Science and Technology Project of Yunnan Province(Grant No.202402AD080002)the Scientific Research Fund of the Yunnan Provincial Department of Education(Grant No.2025Y0302).
文摘End-to-end Temporal Action Detection(TAD)has achieved remarkable progress in recent years,driven by innovations in model architectures and the emergence of Video Foundation Models(VFMs).However,existing TAD methods that perform full fine-tuning of pretrained video models often incur substantial computational costs,which become particularly pronounced when processing long video sequences.Moreover,the need for precise temporal boundary annotations makes data labeling extremely expensive.In low-resource settings where annotated samples are scarce,direct fine-tuning tends to cause overfitting.To address these challenges,we introduce Dynamic LowRank Adapter(DyLoRA),a lightweight fine-tuning framework tailored specifically for the TAD task.Built upon the Low-Rank Adaptation(LoRA)architecture,DyLoRA adapts only the key layers of the pretrained model via low-rank decomposition,reducing the number of trainable parameters to less than 5%of full fine-tuning methods.This significantly lowers memory consumption and mitigates overfitting in low-resource settings.Notably,DyLoRA enhances the temporal modeling capability of pretrained models by optimizing temporal dimension weights,thereby alleviating the representation misalignment of temporal features.Experimental results demonstrate that DyLoRA-TAD achieves impressive performance,with 73.9%mAP on THUMOS14,39.52%on ActivityNet-1.3,and 28.2%on Charades,substantially surpassing the best traditional feature-based methods.
文摘The Qinghai-Xizang Plateau,known as the Roof of the World and the Water Tower of Asia,is recognized as the Earth’s Third Pole.It functions as a vital ecological security barrier and a strategic resource reserve for China,while also serving as an important conservation area that reflects the unique culture of the Chinese nation.Conducting the Second Comprehensive Scientific Expedition to the Qinghai-Xizang Plateau is essential for understanding valuable insights into scientific protection of the region.
文摘In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although multiple instance learning(MIL)has dominated the WSVAD for a long time,its reliance solely on video-level labels without semantic grounding hinders a fine-grained understanding of visually similar yet semantically distinct events.In addition,insufficient temporal modeling obscures causal relationships between events,making anomaly decisions reactive rather than reasoning-based.To overcome the limitations above,this paper proposes an adaptive knowledgebased guidance method that integrates external structured knowledge.The approach combines hierarchical category information with learnable prompt vectors.It then constructs continuously updated contextual references within the feature space,enabling fine-grained meaning-based guidance over video content.Building on this,the work introduces an event relation analysis module.This module explicitly models temporal dependencies and causal correlations between video snippets.It constructs an evolving logic chain of anomalous events,revealing the process by which isolated anomalous snippets develop into a complete event.Experiments on multiple benchmark datasets show that the proposed method achieves highly competitive performance,achieving an AUC of 88.19%on UCF-Crime and an AP of 86.49%on XD-Violence.More importantly,the method provides temporal and causal explanations derived from event relationships alongside its detection results.This capability significantly advances WSVAD from a simple binary classification to a new level of interpretable behavior analysis.
基金funding from Grant No. HIDSS-0002 DASHH (Data Science in Hamburg-Helmholtz Graduate School for the Structure of Matter)partially supported by the Helmholtz Imaging platform through the project “Smart Phase.”
文摘Understanding the complex plasma dynamics in ultra-intense relativistic laser-solid interactions is of fundamental importance for applications of laser-plasma-based particle accelerators,the creation of high-energy-density matter,understanding planetary science,and laser-driven fusion energy.However,experimental efforts in this regime have been limited by the lack of accessibility of over-critical densities and the poor spatiotemporal resolution of conventional diagnostics.Over the last decade,the advent of femtosecond brilliant hard X-ray free-electron lasers(XFELs)has opened new horizons to overcome these limitations.Here,for the first time,we present full-scale spatiotemporal measurements of solid-density plasma dynamics,including preplasma generation with tens of nanometer scale length driven by the leading edge of a relativistic laser pulse,ultrafast heating and ionization at the main pulse arrival,the laser-driven blast wave,and transient surface return current-induced compression dynamics up to hundreds of picoseconds after interaction.These observations are enabled by utilizing a novel combination of advanced X-ray diagnostics including small-angle X-ray scattering,resonant X-ray emission spectroscopy,and propagation-based X-ray phase-contrast imaging simultaneously at the European XFEL-HED beamline station.
基金supported by Russian Science Foundation Grant No.24-62-00032.
文摘Experiments with interacting high-velocity flows in a laser plasma can help answer fundamental questions in plasma physics and improve understanding of the mechanisms behind some astrophysical phenomena,such as the formation of collisionless shock waves,deceleration of accretion flows,and evolution of solar and stellar flares.This work presents the first direct experimental observations of stagnation and redirection of counterstreaming flows(jets)of laser plasma induced by intense laser pulses with intensity I~2×10^(18) W/cm^(2).Hybrid particlein-cell-fluid modeling,which takes into account the kinetic effects of ion motion and the evolution of the pressure tensor for electrons,demonstrates the compression of counterdirected toroidal self-generated magnetic fields embedded in counterstreaming plasma flows.The enhancement of the toroidal magnetic field in the interaction region results in plasma flow stagnation and redirection of the jets across the line of their initial propagation.